<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Cloud Engineers]]></title><description><![CDATA[Level up your Cloud engineering knowledge with this weekly newsletter. Perfect for Cloud Engineers, DevOps, Developers, Architects, and other tech enthusiasts looking to advance their careers while mastering the latest Cloud technologies.]]></description><link>https://blog.thecloudengineers.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Ka2m!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6f36c3-e29e-40f9-ab14-7fc0e81d82d9_759x759.png</url><title>The Cloud Engineers</title><link>https://blog.thecloudengineers.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 01 May 2026 16:26:07 GMT</lastBuildDate><atom:link href="https://blog.thecloudengineers.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Lefteris Karageorgiou]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thecloudengineers@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thecloudengineers@substack.com]]></itunes:email><itunes:name><![CDATA[Lefteris Karageorgiou]]></itunes:name></itunes:owner><itunes:author><![CDATA[Lefteris Karageorgiou]]></itunes:author><googleplay:owner><![CDATA[thecloudengineers@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thecloudengineers@substack.com]]></googleplay:email><googleplay:author><![CDATA[Lefteris Karageorgiou]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[5 Hands-On AWS Projects That Will Prepare You for the SAA-C03]]></title><description><![CDATA[Most people study for the AWS Solutions Architect Associate by watching 40 hours of video and memorizing answers.]]></description><link>https://blog.thecloudengineers.com/p/5-hands-on-aws-projects-that-will</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/5-hands-on-aws-projects-that-will</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 29 Apr 2026 09:30:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4172ed57-b685-4720-b287-bc0ef32c47f9_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most people study for the AWS Solutions Architect Associate by watching 40 hours of video and memorizing answers. Then they get to a real job and freeze.</p><p>I took a different approach. I built projects that forced me to understand the services deeply, not just what they do, but <em>why</em> you&#8217;d choose them, what breaks under load, and how the pieces connect. I passed the SAA-C03 and could talk confidently in interviews about real implementations. Here are the 5 projects I recommend.</p><div><hr></div><h1>Sponsored by Salesforce</h1><h3>Cross-Platform Consistency with Agentforce AXL</h3><p>If you&#8217;ve ever tried to maintain brand and logic parity for an AI agent across Slack, web, and mobile, you know the pain. Enter the Agentforce Experience Layer (AXL). This new abstraction layer allows you to define agent logic and UI components once and have them render natively across any surface&#8212;including third-party platforms like Microsoft Teams.<br><br><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the AXL Orchestration Session</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ViTl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 424w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 848w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1272w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ViTl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png" width="1456" height="364" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:364,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:984968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/194500231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ViTl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 424w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 848w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1272w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Watch the developer conference of the year. On demand on Salesforce+</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Watch TDX on demand to explore dozens of sessions covering the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Watch TDX on Salesforce+ to:</strong></p><ul><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all starts with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Watch it now and catch every moment at your own pace.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Watch Now</a></strong></h3><div><hr></div><p>Let&#8217;s now go back to our article and see the 5 projects.</p><h2>Project 1: Three-Tier Web Application</h2><p><strong>What you build:</strong> A custom VPC with public and private subnets, an Application Load Balancer, an Auto Scaling Group, and an RDS database.</p><p><strong>Why it matters:</strong> This is the foundational architecture pattern behind almost every production web workload on AWS. The exam tests it constantly, and so does every technical interview.</p><p>The critical insight here is <em>traffic flow</em>. Your ALB lives in the public subnet and accepts inbound traffic on port 443. Your EC2 instances sit in private subnets and only accept traffic from the ALB&#8217;s security group, not from the internet. Your RDS instance sits in a separate private subnet and only accepts traffic from the EC2 security group. Nothing in the data tier is ever directly reachable.</p><p>When you build this yourself, you stop memorizing &#8220;databases should be private&#8221; and start understanding <em>why</em>: defense in depth, blast radius reduction, and compliance requirements that mandate network isolation. You also learn where Auto Scaling actually helps, horizontal scaling behind the ALB, and where it doesn&#8217;t, like a single-AZ RDS instance that becomes your bottleneck at 500 concurrent connections.</p><p><strong>What the exam will ask:</strong> Multi-AZ RDS failover behavior, ALB vs. NLB selection criteria, and how security group rules differ from NACLs. Build this project and those questions answer themselves.</p><h2>Project 2: Serverless Image Processing Pipeline</h2><p><strong>What you build:</strong> An S3 upload triggers Lambda, which resizes the image, stores metadata in DynamoDB, and sends an SNS notification.</p><p><strong>Why it matters:</strong> Event-driven architecture is the dominant pattern in modern cloud systems. This project teaches you how AWS services communicate asynchronously, and what happens when they don&#8217;t.</p><p>The flow looks simple: S3 event &#8594; Lambda &#8594; DynamoDB write + SNS publish. But building it forces you to confront real decisions. What&#8217;s your Lambda timeout? If image processing takes 8 seconds and you set 5, you&#8217;ll see silent failures. What&#8217;s your DynamoDB write capacity? If you&#8217;re processing 200 images per minute and your table is provisioned for 50 WCU, you&#8217;ll hit throttling. What happens to the SNS notification if the DynamoDB write fails? You&#8217;ll need to think about idempotency and partial failure handling.</p><p><strong>What the exam will ask:</strong> S3 event notification targets, Lambda concurrency limits, DynamoDB capacity modes, and SNS delivery guarantees. This project covers all of them.</p><h2>Project 3: Disaster Recovery Solution</h2><p><strong>What you build:</strong> Cross-region S3 replication, automated RDS backups, and Route 53 failover routing.</p><p><strong>Why it matters:</strong> DR is one of the most heavily tested domains on the SAA-C03, and it&#8217;s also one of the most misunderstood in practice. Most candidates can recite the four DR strategies. Few can explain the trade-offs that determine which one you&#8217;d actually choose.</p><p>Building this project forces you to internalize the RTO/RPO trade-off with real numbers. Cross-region S3 replication gives you near-zero RPO for object storage, replication typically completes in under 15 minutes for objects under 5GB. But your RDS automated backup has an RPO of up to 24 hours unless you&#8217;re using read replicas or Aurora Global Database. Route 53 health checks with failover routing can redirect traffic in under 60 seconds, but only if your secondary environment is already running (warm standby) or fully active (multi-site active-active).</p><p>The exam distinguishes between backup/restore (hours of RTO, lowest cost), pilot light (minutes of RTO, minimal running infrastructure), warm standby (seconds to minutes, scaled-down but live), and multi-site active-active (near-zero RTO, highest cost). Build this project and you&#8217;ll understand why a financial services company chooses multi-site active-active at $40,000/month while a content platform chooses backup/restore at $200/month.</p><p><strong>What the exam will ask:</strong> S3 replication configuration, RDS backup retention, Route 53 routing policies, and how to calculate RTO/RPO for each DR tier.</p><h2>Project 4: Hybrid Storage Solution</h2><p><strong>What you build:</strong> AWS Storage Gateway connected to on-premises systems, S3 lifecycle policies moving data through storage tiers, and IAM policies controlling access.</p><p><strong>Why it matters:</strong> Most cloud engineers underestimate how much enterprise workload still runs on-premises. Storage Gateway is the bridge, and understanding it makes you sound experienced in interviews because it signals you&#8217;ve thought about migration, not just greenfield architecture.</p><p>Storage Gateway has three modes: File Gateway (NFS/SMB access to S3), Volume Gateway (iSCSI block storage backed by S3), and Tape Gateway (virtual tape library for backup software). The exam tests which mode fits which scenario. Build a File Gateway and you&#8217;ll understand why a media company with 200TB of on-premises video assets uses it to extend their NAS to S3 without rewriting their editing workflows.</p><p>S3 lifecycle policies complete the picture. Moving objects from S3 Standard to S3 Standard-IA after 30 days saves roughly 46% on storage costs. Moving to S3 Glacier Instant Retrieval after 90 days saves another 68%. For a workload storing 10TB of infrequently accessed data, that&#8217;s the difference between $230/month and $40/month. The exam will ask you to design the right lifecycle policy for a given access pattern, build this project and you&#8217;ll answer from experience, not memorization.</p><p><strong>What the exam will ask:</strong> Storage Gateway modes, S3 storage class trade-offs, lifecycle policy configuration, and IAM policy structure for cross-account S3 access.</p><h2>Project 5: Containerized Microservices Application</h2><p><strong>What you build:</strong> Two services deployed on ECS with Fargate, task definitions with resource limits, ALB path-based routing to each service, and CloudWatch Container Insights for logging and metrics.</p><p><strong>Why it matters:</strong> Containers are now a core SAA-C03 domain, and ECS with Fargate is the AWS-native answer to &#8220;I want containers without managing servers.&#8221; Building this project teaches you the ECS mental model, clusters, services, task definitions, and tasks, and how they map to the infrastructure underneath.</p><p>The ALB routing piece is where most candidates get confused. Path-based routing lets you send <code>/api/orders/*</code> to your orders service and <code>/api/inventory/*</code> to your inventory service, both running as separate ECS services behind a single ALB. Each service has its own target group, its own task definition with CPU and memory limits, and its own auto-scaling policy based on CPU utilization or request count. When you build this, you understand why a task definition with 256 CPU units and 512MB memory will throttle under load before it scales, and how to set the right CloudWatch alarm threshold to trigger scaling before users notice.</p><p>CloudWatch Container Insights gives you container-level CPU, memory, network, and disk metrics without any instrumentation. You&#8217;ll see exactly which task is consuming resources and correlate it with application logs in the same console. That operational visibility is what separates a working prototype from a production-ready deployment.</p><p><strong>What the exam will ask:</strong> ECS vs. EKS selection criteria, Fargate vs. EC2 launch type trade-offs, ALB target group configuration, and CloudWatch metrics for container workloads.</p><h2>Conclusion</h2><p>Watching videos teaches you what AWS services do. Building projects teaches you what they cost, where they fail, and why you&#8217;d choose one over another. Every question on the SAA-C03 is ultimately asking: <em>given these constraints, what&#8217;s the right architecture decision?</em></p><p>These five projects give you the intuition to answer that question, not just on the exam, but in the room when a customer asks why their database is the bottleneck, why their DR plan won&#8217;t meet their RTO, or why their serverless pipeline is costing more than expected.</p><p>Build the projects. Pass the exam. Show up to the interview ready to talk about real systems.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Building Production-Grade API Security]]></title><description><![CDATA[We built a customer-facing API processing thousands of transactions per minute.]]></description><link>https://blog.thecloudengineers.com/p/building-production-grade-api-security</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/building-production-grade-api-security</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 22 Apr 2026 09:31:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/70ba8566-be57-4213-9d70-2324df9b85d3_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We built a customer-facing API processing thousands of transactions per minute. The challenge wasn&#8217;t just handling the volume, it was protecting against attacks while maintaining performance. Every blocked malicious request <strong>saves money</strong> and <strong>prevents potential breaches</strong>, but every millisecond of latency added by security layers impacts user experience.</p><div><hr></div><h1>Sponsored by Salesforce</h1><h3>Salesforce goes Headless 360</h3><p>The biggest news out of SF last week wasn&#8217;t a new UI, it was the decoupling of it. Salesforce is officially pivoting to an API-first &#8220;Headless 360&#8221; architecture. For those of us tired of the walled garden constraints, this is a massive shift. By separating core CRM logic and data from the standard UI, we can now treat Salesforce as a backend engine for any custom frontend or external application stack.</p><p>Click the <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">link here</a> and let me know what you think!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2383485,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Explore the biggest announcements, launches, and innovations from TDX 2026.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Watch TDX on demand to explore dozens of sessions covering the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Watch TDX on Salesforce+ to:</strong></p><ul><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all starts with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Watch it now and catch every moment at your own pace.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Watch Now</a></strong></h3><div><hr></div><h2>The Requirements</h2><p>Let&#8217;s now go back to our article. We needed security that addressed four critical concerns:</p><p><strong>Application-layer protection</strong> against SQL injection, XSS, and other OWASP Top 10 attacks that target business logic rather than infrastructure.</p><p><strong>DDoS mitigation</strong> at both network and application layers without manual intervention. When attacks happen at 3 AM, automated response is non-negotiable.</p><p><strong>Rate limiting and throttling</strong> that works across a global user base. Simple IP-based limiting breaks with legitimate users behind corporate NATs or mobile carriers.</p><p><strong>Performance constraints</strong> of sub-200ms API response times even with all security layers active. Security cannot become the bottleneck.</p><p>We also needed complete observability into every attack attempt with real-time alerting when patterns emerge. The solution had to scale automatically, security couldn&#8217;t become the constraint during traffic spikes.</p><h2>The AWS Services</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SDHH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SDHH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 424w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 848w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1272w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SDHH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png" width="1456" height="189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:189,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53419,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/193558909?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!SDHH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 424w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 848w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1272w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>API Gateway</strong> serves as the entry point, handling authentication, request validation, and throttling before requests reach backend functions. It&#8217;s the first line of defense that rejects malformed requests immediately.</p><p><strong>AWS WAF</strong> sits in front of API Gateway, inspecting every HTTP request against managed rule sets and custom rules. It blocks attacks at the edge before they consume API Gateway capacity or cost Lambda invocations.</p><p><strong>AWS Shield Standard</strong> comes automatically with API Gateway, providing DDoS protection against common network and transport layer attacks. Shield Advanced is available for scenarios requiring dedicated DDoS response team support.</p><h2>The Flow</h2><p>When a request hits our API, it reaches the regional API Gateway with WAF rules applied at the edge. If the request matches a block rule, suspicious patterns, known bad IPs, rate limit exceeded, it&#8217;s rejected with a 403 before reaching API Gateway.</p><p>For legitimate requests that pass WAF inspection, API Gateway applies its own throttling limits, validates the request structure, checks API keys or JWT tokens, and only then invokes backend functions. If anything fails validation, the request is rejected without consuming compute capacity.</p><p>This layered approach means attacks get stopped at the cheapest point possible. WAF blocks cost nothing beyond the WAF inspection fee. API Gateway rejections cost nothing beyond the API call. Only legitimate requests that pass all checks consume Lambda invocations.</p><h2>Deep Dive: WAF Rule Strategy</h2><p>We started with AWS Managed Rules, they catch 90% of common attacks immediately. But we hit false positives on legitimate API calls that included JSON payloads with special characters.</p><p>Our custom rules operate with three priorities:</p><p><strong>IP reputation list</strong> blocks known malicious IPs. We integrated threat intelligence feeds that update hourly. This stops repeat offenders before they even attempt an attack.</p><p><strong>Rate-based rules</strong> block IPs making more than 2,000 requests in five minutes. This catches credential stuffing attempts targeting our login endpoints. We observed attackers cycling through stolen credentials at exactly this rate, fast enough to test thousands of accounts but slow enough to avoid obvious detection.</p><p><strong>Geo-blocking rules</strong> target countries with no legitimate users but high attack traffic. This was controversial internally, but the data was clear: 95% of attacks came from regions with zero paying customers.</p><p>The critical decision: we set WAF to count mode initially, not block. We observed traffic patterns for two weeks, identified false positives, tuned rules, then switched to block mode. This prevented accidentally blocking legitimate users during the learning phase.</p><h2>Deep Dive: API Gateway Security Features</h2><p>API Gateway provides multiple security layers beyond basic routing.</p><p><strong>Request validation</strong> happens before Lambda invocation, rejecting requests with malformed JSON, missing required parameters, or invalid data types. This prevents malicious payloads from reaching backend code. We defined JSON schemas for every endpoint, verbose to maintain, but it stops entire classes of attacks.</p><p><strong>Authentication</strong> integrates with multiple mechanisms. We use API keys for simple use cases, Lambda authorizers for custom logic, and Cognito user pools for OAuth 2.0 flows. Each request is validated before consuming any compute resources.</p><p><strong>Resource policies</strong> add another control layer. We restricted our internal APIs to specific VPCs, preventing public internet access entirely while maintaining the benefits of API Gateway&#8217;s managed service.</p><h2>Deep Dive: Throttling Strategy</h2><p>API Gateway&#8217;s throttling operates at multiple levels, and understanding each is critical.</p><p><strong>Account-level limits</strong> serve as a safety net, but real control comes from usage plans. Each usage plan has both rate limits (steady-state) and burst capacity. Burst capacity matters during traffic spikes when clients temporarily exceed their rate limit using burst tokens. We set burst to 2x the rate limit, this handles legitimate spikes without triggering false positives.</p><p><strong>Custom throttling in WAF</strong> for specific endpoints adds another layer. Our login endpoints get rate-limited to 10 requests per minute per IP, far more restrictive than general API limits. This stops brute force attacks without impacting normal API usage.</p><p><strong>Method-level throttling</strong> provides even finer control. Read operations (GET) can have higher limits than write operations (POST, PUT, DELETE), reflecting their different resource consumption and risk profiles. We observed that 80% of attacks targeted write endpoints, so we throttled them more aggressively.</p><h2>Deep Dive: Shield Protection</h2><p>Shield Standard provides automatic protection against common DDoS attacks at the network and transport layers. It detects and mitigates SYN floods, UDP reflection attacks, and other volumetric attacks without any configuration.</p><p>The protection operates inline with minimal latency impact. Shield&#8217;s detection algorithms analyze traffic patterns in real-time, distinguishing legitimate traffic spikes from attack patterns. When attacks are detected, mitigation happens automatically within seconds.</p><p>For our API, Shield Advanced wasn&#8217;t initially necessary. The decision point: Shield Advanced makes sense when API downtime costs exceed $3,000/month or when regulatory requirements mandate dedicated security response. We added it after a competitor suffered a multi-day DDoS attack that made headlines.</p><h2>Monitoring and Observability</h2><p>WAF logs every blocked request to CloudWatch Logs. We stream these to S3 for long-term analysis. Analyzing these logs reveals attack patterns, identifies false positives, and guides rule tuning. We set up CloudWatch alarms for blocked request spikes, when blocks exceed 1,000 per minute, we get paged.</p><p>API Gateway metrics expose throttling events, 4xx/5xx error rates, and latency distributions. We correlate WAF blocks with API Gateway metrics to see the full security picture, which attacks reached API Gateway versus which were blocked at WAF.</p><p>X-Ray tracing adds end-to-end visibility, showing exactly where requests spend time and where failures occur. This proved essential when debugging whether performance issues stemmed from security layers, backend code, or downstream services.</p><h2>Cost Breakdown</h2><p><strong>WAF costs</strong> run approximately $5/month base fee plus $1 per million requests plus $1 per rule per month. With 10 custom rules and 100 million requests monthly, that&#8217;s $115/month.</p><p><strong>API Gateway costs</strong> are $3.50 per million requests for REST APIs. At 100 million requests monthly, that&#8217;s $350/month. Caching can reduce backend invocations significantly.</p><p><strong>Shield Standard</strong> is included at no additional cost. Shield Advanced costs $3,000/month plus data transfer fees, only justified for high-value APIs where downtime is extremely costly.</p><p><strong>Lambda invocations</strong> saved by blocking malicious requests represent the hidden cost savings. We block approximately 5 million malicious requests monthly that would have cost $1 in Lambda invocations plus potential data breach costs.</p><p>Total monthly cost for our security stack: approximately $465 for WAF and API Gateway, blocking attacks that would cost far more in compute resources and potential breaches.</p><h2>Key Takeaways</h2><p>This architecture stops millions of malicious requests monthly, requests that would cost money and potentially compromise systems. The layered approach proves essential: WAF catches obvious attacks, API Gateway enforces business logic throttling and authentication, and Shield protects against network-layer DDoS. No single layer would be sufficient.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Tracing Serverless Applications with AWS X-Ray]]></title><description><![CDATA[In the world of serverless architectures, understanding how requests flow through your distributed system can feel like navigating a maze blindfolded.]]></description><link>https://blog.thecloudengineers.com/p/tracing-serverless-applications-with</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/tracing-serverless-applications-with</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Tue, 14 Apr 2026 09:29:34 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/74727264-8899-43c5-bb01-593e9b107220_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of serverless architectures, understanding how requests flow through your distributed system can feel like navigating a maze blindfolded. AWS X-Ray emerges as your guiding light, providing the visibility needed to trace requests as they traverse through Lambda functions, API Gateway endpoints, SQS queues, and other AWS services.</p><div><hr></div><h1>Sponsored by Salesforce</h1><p>The shift toward Agentic AI isn&#8217;t just another buzzword&#8212;it&#8217;s redefining how we design, architect, and build modern systems.</p><p>That&#8217;s exactly why I&#8217;ll be tuning into <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">TDX on Salesforce+</a> to explore how Agentforce is shaping the next generation of the developer roadmap.</p><p>If you&#8217;re serious about staying ahead of the curve, I highly recommend grabbing a free spot and joining the sessions.</p><p>Plus, when you register, you&#8217;ll be automatically entered for a chance to win one of 20 AI exam vouchers (valued at $200)&#8212;available to legal residents of the U.S., Canada, New Zealand, and the U.K.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2383485,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the developer conference of the year. Live on Salesforce+.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Stream TDX to join dozens of sessions and virtual hands-on trainings that explore the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Tune in to TDX on Salesforce+ to:</strong></p><ul><li><p>Build hands-on skills with virtual trainings and live demos</p></li><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all kicks off with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Add it to your calendar so you don&#8217;t miss a moment.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Register for free</a></strong></h3><div><hr></div><h2>Why Tracing Matters in Serverless</h2><p>Serverless applications are inherently distributed. A single user request might trigger an API Gateway endpoint, invoke multiple Lambda functions, write to DynamoDB, publish messages to SNS, and queue tasks in SQS. When something goes wrong, or even when you&#8217;re optimizing performance, pinpointing the bottleneck becomes challenging without proper tracing.</p><p>X-Ray solves this by creating a complete map of your request journey, showing you exactly where time is spent, where errors occur, and how services interact with each other. It&#8217;s the difference between guessing and knowing.</p><h2>Understanding the Service Map</h2><p>The service map is X-Ray&#8217;s visual representation of your application architecture. It automatically discovers and displays all the services your application uses, showing the relationships between them. Each node represents a service, and the connections show how requests flow between them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!scXp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!scXp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 424w, https://substackcdn.com/image/fetch/$s_!scXp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 848w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!scXp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:287398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191561839?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!scXp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 424w, https://substackcdn.com/image/fetch/$s_!scXp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 848w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What makes this powerful is the real-time health indicators. You can immediately spot services with high error rates or elevated latency. The color coding provides instant visual feedback, green for healthy, yellow for warnings, and red for errors. This bird&#8217;s-eye view helps you understand your system&#8217;s behavior at a glance.</p><h2>Traces and Segments: The Building Blocks</h2><p>Every request that flows through your application creates a trace. Within each trace, individual services create segments that represent the work they perform. For Lambda functions, a segment captures the entire function execution. For downstream calls to DynamoDB or other services, subsegments provide granular detail.</p><p>The trace timeline shows you exactly how long each segment took, helping you identify slow operations. You can drill down into specific traces to see the complete request path, including all the metadata, annotations, and errors associated with each segment.</p><h2>Practical Tips for Effective Tracing</h2><p><strong>Start with sampling strategies wisely.</strong> X-Ray uses sampling to balance cost and visibility. The default sampling rule captures the first request each second and 5% of additional requests. For production environments, this is usually sufficient. However, for critical workflows or during troubleshooting, consider creating custom sampling rules to capture 100% of specific API paths or error conditions.</p><p><strong>Use annotations for filtering and grouping.</strong> Annotations are indexed key-value pairs that you can use to filter traces in the X-Ray console. Add annotations for user IDs, transaction types, or feature flags. This makes it easy to analyze specific user journeys or compare performance across different code paths.</p><p><strong>Leverage metadata for context.</strong> Unlike annotations, metadata isn&#8217;t indexed but provides rich contextual information when viewing individual traces. Include relevant business context, configuration values, or debugging information that helps you understand what was happening during the request.</p><p><strong>Monitor cold starts separately.</strong> Lambda cold starts can significantly impact performance. Use X-Ray annotations to tag cold start invocations, allowing you to analyze their frequency and impact separately from warm starts. This helps you make informed decisions about provisioned concurrency.</p><p><strong>Set up alarms on key metrics.</strong> X-Ray integrates with CloudWatch, allowing you to create alarms based on error rates, latency percentiles, or fault rates. Don&#8217;t wait to discover problems, let X-Ray notify you when thresholds are breached.</p><p><strong>Trace asynchronous workflows.</strong> For event-driven architectures using SNS, SQS, or EventBridge, ensure you&#8217;re propagating trace context through message attributes. This maintains the trace continuity across asynchronous boundaries, giving you end-to-end visibility even in complex event chains.</p><p><strong>Use trace groups for organization.</strong> As your application grows, create trace groups to organize and filter traces by environment, application component, or team ownership. This keeps your X-Ray console manageable and helps teams focus on their specific services.</p><h2>The Bottom Line</h2><p>AWS X-Ray transforms serverless observability from reactive debugging to proactive optimization. By providing clear visibility into request flows, performance characteristics, and error patterns, it empowers you to build more reliable and efficient serverless applications.</p><p>The key is to integrate X-Ray early in your development process, not as an afterthought when problems arise. With proper instrumentation and thoughtful use of annotations and sampling, X-Ray becomes an invaluable tool for understanding and improving your serverless architecture.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[API Gateway Canary Deployments: A Strategic Approach to Safe Releases]]></title><description><![CDATA[Canary deployments are one of the most powerful risk mitigation strategies available to cloud engineers working with AWS API Gateway and Lambda.]]></description><link>https://blog.thecloudengineers.com/p/api-gateway-canary-deployments-a</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/api-gateway-canary-deployments-a</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 08 Apr 2026 09:30:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6ce77640-8a70-4628-aef8-53a009a8caa5_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Canary deployments are one of the most powerful risk mitigation strategies available to cloud engineers working with AWS API Gateway and Lambda. They allow you to test new versions of your APIs in production with real traffic while minimizing the blast radius of potential issues. Understanding how they work and when to use them can be the difference between a smooth rollout and a production incident.</p><div><hr></div><h1>Sponsored by Salesforce</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2383485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the developer conference of the year. Live on Salesforce+.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Stream TDX to join dozens of sessions and virtual hands-on trainings that explore the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Tune in to TDX on Salesforce+ to:</strong></p><ul><li><p>Build hands-on skills with virtual trainings and live demos</p></li><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all kicks off with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Add it to your calendar so you don&#8217;t miss a moment.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Register for free</a></strong></h3><div><hr></div><h2>What Are Canary Deployments?</h2><p>A canary deployment is a progressive rollout strategy where you route a small percentage of production traffic to a new version of your application while the majority continues using the stable version. The term comes from the historical practice of using canaries in coal mines as early warning systems, if something goes wrong with the new version, only a small subset of users are affected.</p><p>In API Gateway, canary deployments work at the stage level. When you create a canary, you&#8217;re essentially splitting traffic between two versions of your API: the base deployment and the canary deployment. Each can point to different Lambda function versions or aliases, allowing you to test new code with real production traffic.</p><h2>How Canary Deployments Work with Lambda</h2><p>The integration between API Gateway canary deployments and Lambda is straightforward but powerful. When you deploy a canary in API Gateway, you specify a percentage of traffic to route to the canary version, typically starting with 5-10%. The remaining traffic continues flowing to your stable base deployment.</p><p>API Gateway uses a weighted random distribution to split traffic. This means if you set a 10% canary weight, approximately 10% of requests will hit your canary Lambda function version, while 90% go to the base version. The distribution happens at the request level, not the user level, so individual users might experience both versions during a canary deployment.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7zdz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7zdz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 424w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 848w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1272w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7zdz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png" width="470" height="336.7800453514739" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:632,&quot;width&quot;:882,&quot;resizeWidth&quot;:470,&quot;bytes&quot;:57477,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7zdz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 424w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 848w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1272w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The key is that both deployments exist simultaneously in the same stage. Your base deployment might point to Lambda alias &#8220;production&#8221; running version 5, while your canary points to alias &#8220;canary&#8221; running version 6. This allows you to validate new functionality with real traffic patterns before committing to a full rollout.</p><h2>Practical Tips for Success</h2><p><strong>Start Small and Increase Gradually</strong>: Begin with 5-10% traffic to your canary. Monitor metrics closely for 15-30 minutes before increasing. A common progression is 10% &#8594; 25% &#8594; 50% &#8594; 100%, with monitoring periods between each increase.</p><p><strong>Use Lambda Aliases, Not Versions Directly</strong>: Always point your API Gateway integrations to Lambda aliases rather than specific versions. This gives you flexibility to update what version an alias points to without redeploying your API. Use one alias for production traffic and another for canary testing.</p><p><strong>Monitor the Right Metrics</strong>: Focus on error rates, latency percentiles (especially p99), and business-specific metrics. Don&#8217;t just watch averages, a 1% error rate on your canary means 1 in 100 of your customers are having problems. Set up CloudWatch alarms that compare canary metrics against baseline metrics.</p><p><strong>Implement Proper Logging</strong>: Ensure your Lambda functions log which version they are. Include version information in structured logs so you can quickly filter and compare behavior between canary and base deployments during troubleshooting.</p><p><strong>Have a Rollback Plan</strong>: Know how to quickly promote your canary to 100% if it&#8217;s successful, or delete it entirely if problems arise. Practice these operations in non-production environments. The API Gateway console and CLI both support these operations, but automation through CI/CD pipelines is ideal.</p><p><strong>Consider Stage Variables</strong>: Use API Gateway stage variables to pass configuration to your Lambda functions. This allows the same Lambda code to behave differently based on whether it&#8217;s handling base or canary traffic, useful for feature flags or environment-specific settings.</p><p><strong>Test with Realistic Load</strong>: Canary deployments are most valuable when they experience real production traffic patterns. Avoid deploying canaries during low-traffic periods if possible, you want enough requests to generate statistically significant results.</p><p><strong>Don&#8217;t Rush the Process</strong>: The entire point of canary deployments is risk reduction. If you promote a canary to 100% within minutes, you&#8217;re not giving yourself time to detect issues. Plan for canary periods of at least several hours for critical APIs.</p><h2>When to Use Canaries</h2><p>Canary deployments shine when rolling out significant changes: new business logic, performance optimizations, dependency updates, or architectural changes. They&#8217;re less critical for minor bug fixes or configuration changes, though the safety they provide is always valuable.</p><p>The overhead of managing canaries is minimal compared to the protection they offer. For production APIs serving real customers, canary deployments should be your default deployment strategy, not an exception.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Multi-Region Serverless Architectures: Building Global Resilience with AWS]]></title><description><![CDATA[Building applications that serve users across the globe requires more than just deploying to a single region.]]></description><link>https://blog.thecloudengineers.com/p/multi-region-serverless-architectures</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/multi-region-serverless-architectures</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 01 Apr 2026 09:30:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e78fa5ef-5ca1-4dda-9bbc-88d4166d3b1e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Building applications that serve users across the globe requires more than just deploying to a single region. Multi-region architectures provide low latency, high availability, and disaster recovery capabilities that are essential for modern cloud applications. In this article, we&#8217;ll explore how to architect a truly global serverless application using AWS services.</p><div><hr></div><h1>Sponsored by Salesforce</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Eeh5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2386370,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/192000252?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Eeh5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the developer conference of the year. Live on Salesforce+.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Stream TDX to join dozens of sessions and virtual hands-on trainings that explore the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Tune in to TDX on Salesforce+ to:</strong></p><ul><li><p>Build hands-on skills with virtual trainings and live demos</p></li><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all kicks off with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Add it to your calendar so you don&#8217;t miss a moment.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Register for free</a></strong></h3><div><hr></div><h2>Architecture Overview</h2><p>Our multi-region architecture consists of four key components working together:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OPox!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OPox!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 424w, https://substackcdn.com/image/fetch/$s_!OPox!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 848w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1272w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OPox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png" width="1456" height="697" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:697,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191559295?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OPox!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 424w, https://substackcdn.com/image/fetch/$s_!OPox!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 848w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1272w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Route 53</strong>: Provides intelligent DNS routing to direct users to the nearest regional endpoint</p></li><li><p><strong>API Gateway</strong>: Regional REST or HTTP APIs that serve as entry points in each region</p></li><li><p><strong>Lambda</strong>: Compute layer that processes requests with minimal latency</p></li><li><p><strong>DynamoDB Global Tables</strong>: Multi-region, fully replicated database with automatic conflict resolution</p></li></ul><p>This architecture provides active-active deployment across multiple AWS regions, ensuring that if one region experiences issues, traffic automatically fails over to healthy regions.</p><h2>The Architecture Flow</h2><p>At the heart of a multi-region serverless architecture lies a carefully orchestrated chain of AWS services working together to deliver a seamless global experience.</p><p><strong>Route 53: The Global Traffic Director</strong></p><p>Amazon Route 53 serves as the entry point for all user requests. Using health checks and routing policies like latency-based or geoproximity routing, Route 53 intelligently directs users to the nearest healthy regional endpoint. This ensures that a user in Tokyo connects to the Asia Pacific region while a user in Frankfurt connects to Europe, minimizing latency and improving user experience.</p><p><strong>API Gateway: Regional Entry Points</strong></p><p>Each region hosts its own API Gateway endpoint, acting as the front door for that region&#8217;s serverless infrastructure. API Gateway handles request validation, throttling, and authentication before passing requests to the compute layer. With features like request/response transformation and built-in AWS WAF integration, it provides a robust security perimeter for your application.</p><p><strong>Lambda: Distributed Compute</strong></p><p>Behind each regional API Gateway sits AWS Lambda functions that execute your business logic. These functions are deployed identically across all regions, ensuring consistent behavior regardless of where the request originates. Lambda&#8217;s automatic scaling handles traffic spikes in each region independently, while its pay-per-use model keeps costs optimized even with a multi-region footprint.</p><p><strong>DynamoDB Global Tables: The Data Layer</strong></p><p>The foundation of this architecture is DynamoDB Global Tables, which provides fully managed, multi-region, multi-active database replication. When a Lambda function writes data in one region, DynamoDB automatically replicates that data to all other configured regions, typically within seconds. This means users can read and write data from their nearest region while maintaining global consistency.</p><h2>Key Architectural Considerations</h2><p><strong>Conflict Resolution</strong>: DynamoDB Global Tables uses a last-writer-wins reconciliation strategy. Your application design should account for this, potentially using timestamps or version numbers to handle concurrent updates across regions.</p><p><strong>Replication Lag</strong>: While DynamoDB replication is fast, there&#8217;s still a brief window where data might not be consistent across all regions. Design your application to handle eventual consistency gracefully.</p><p><strong>Regional Failover</strong>: Route 53 health checks continuously monitor your regional endpoints. If a region becomes unhealthy, traffic automatically routes to the next best region, providing transparent failover without user intervention.</p><p><strong>Cost Implications</strong>: Running infrastructure in multiple regions increases costs through data transfer charges and duplicated resources. Balance the number of regions against your actual user distribution and availability requirements.</p><h2>The Benefits</h2><p>This architecture delivers several compelling advantages. Users experience lower latency by connecting to nearby infrastructure. Your application remains available even if an entire AWS region experiences an outage. Data is protected through geographic redundancy. And perhaps most importantly, you can scale globally without redesigning your architecture.</p><h2>When to Use This Pattern</h2><p>Multi-region architectures make sense for applications with a global user base, strict availability requirements, or regulatory needs for data residency. However, they add complexity and cost, so evaluate whether your application truly needs global distribution or if a single-region deployment with good disaster recovery would suffice.</p><h2>Conclusion</h2><p>Building a multi-region serverless architecture with Route 53, API Gateway, Lambda, and DynamoDB Global Tables provides a robust foundation for globally distributed applications. This architecture delivers low-latency responses to users worldwide while maintaining high availability and disaster recovery capabilities.</p><p>The key to success is embracing eventual consistency, implementing comprehensive monitoring, and regularly testing your failover mechanisms. While the initial setup requires careful planning, the operational benefits of automatic scaling, reduced latency, and improved reliability make it worthwhile for applications serving a global user base.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[A Beginner’s Guide to Testing Serverless Applications]]></title><description><![CDATA[Testing Serverless Applications: A Comprehensive Guide]]></description><link>https://blog.thecloudengineers.com/p/testing-serverless-applications-a</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/testing-serverless-applications-a</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 25 Mar 2026 10:30:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9d769570-0526-4121-bcc3-0712f708d23d_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Testing serverless applications presents unique challenges that differ significantly from traditional application testing. The ephemeral nature of Lambda functions, the distributed architecture of serverless systems, and the tight integration with managed AWS services require a thoughtful approach to testing strategy. This article explores comprehensive testing methodologies for serverless applications with practical tools and frameworks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Serverless Testing Challenge</h2><p>Serverless architectures introduce complexity that makes testing more nuanced than traditional applications. Lambda functions don&#8217;t run in isolation, they interact with API Gateway, DynamoDB, S3, SQS, SNS, EventBridge, and numerous other AWS services. The pay-per-invocation model means you can&#8217;t simply spin up a test environment and leave it running. Additionally, the distributed nature of serverless systems means that failures can cascade across multiple services in ways that are difficult to predict and reproduce.</p><p>The testing pyramid for serverless applications looks different from traditional applications. While you still want a solid foundation of unit tests, the integration layer becomes significantly more important because so much of your application&#8217;s behavior depends on how services interact with each other.</p><h2>Unit Testing Lambda Functions</h2><p>Unit testing Lambda functions focuses on testing your business logic in isolation from AWS services. The key principle is to separate your core logic from the Lambda handler and AWS SDK calls. This separation allows you to test your business logic without needing to mock AWS services or make actual API calls.</p><p><strong>Testing Frameworks</strong>: Use <strong>Jest</strong> or <strong>Mocha</strong> for Node.js Lambda functions, <strong>pytest</strong> for Python, <strong>JUnit</strong> for Java, or <strong>Go&#8217;s built-in testing package</strong> for Go-based functions. These frameworks provide the foundation for writing and running your unit tests with features like test discovery, assertions, and test reporting.</p><p>Your Lambda handler should be thin, primarily responsible for parsing events, calling your business logic, and formatting responses. The actual business logic should live in separate modules that can be tested independently. This architectural pattern makes your code more testable and maintainable.</p><p>When unit testing, focus on testing edge cases, error conditions, and business rule validation. Test how your code handles malformed input, missing required fields, and unexpected data types. These tests should run quickly and require no external dependencies, making them ideal for rapid feedback during development.</p><p><strong>Mocking Tools</strong>: Mock external dependencies at the boundaries of your application using tools like <strong>aws-sdk-mock</strong> for JavaScript, or <strong>moto</strong> for Python. Rather than mocking the entire AWS SDK, create abstraction layers or interfaces that represent the operations your code needs to perform. This approach makes your tests more resilient to changes in the AWS SDK and keeps your business logic decoupled from infrastructure concerns.</p><h2>Integration Testing Strategies</h2><p>Integration testing for serverless applications verifies that your Lambda functions work correctly with actual AWS services. This is where you test the contracts between your code and services like DynamoDB, S3, SQS, and API Gateway. Integration tests are more expensive and slower than unit tests, but they catch issues that unit tests cannot.</p><p><strong>Integration Testing Tools</strong>: Leverage <strong>AWS SAM CLI</strong> for local testing and deployment, <strong>Serverless Framework&#8217;s invoke local</strong> command, or <strong>LocalStack</strong> for comprehensive AWS service emulation. For end-to-end testing, consider <strong>Postman</strong> or <strong>Newman</strong> for API testing, and <strong>aws-testing-library</strong> for programmatic integration tests.</p><p>There are several approaches to integration testing in serverless environments. One strategy involves deploying to a dedicated testing environment in AWS and running tests against real services using <strong>AWS CDK</strong> or <strong>Terraform</strong> for infrastructure provisioning. This approach provides the highest confidence that your application will work in production, but it&#8217;s also the slowest and most expensive option.</p><p>Another approach uses local emulation tools like <strong>LocalStack</strong>, <strong>SAM Local</strong>, or <strong>Serverless Offline</strong> that simulate AWS services on your development machine. These tools provide a faster feedback loop than deploying to AWS, but they may not perfectly replicate the behavior of actual AWS services. They&#8217;re best used for rapid iteration during development, with periodic validation against real AWS services.</p><p><strong>Contract Testing</strong>: Consider implementing contract testing using <strong>Pact</strong> or <strong>Spring Cloud Contract</strong> for your Lambda functions. Contract tests verify that your functions correctly handle the event formats they receive from triggers like API Gateway, EventBridge, or SQS. They also verify that your functions produce output in the format expected by downstream services. This approach helps catch breaking changes early.</p><p>Integration tests should cover the critical paths through your application, the workflows that represent your core business value. Test how your functions handle retries, how they behave under throttling conditions, and how they recover from transient failures using tools like <strong>Chaos Toolkit</strong> or <strong>AWS Fault Injection Simulator</strong> for chaos engineering experiments. These scenarios are difficult to test with unit tests alone.</p><h2>Local Development Strategies</h2><p>Effective local development for serverless applications requires tools and workflows that provide rapid feedback without requiring constant deployment to AWS. The goal is to enable developers to iterate quickly while maintaining confidence that their code will work when deployed.</p><p><strong>Local Development Tools</strong>: Use <strong>AWS SAM CLI</strong> with <code>sam local start-api</code> and <code>sam local invoke</code> commands, <strong>Serverless Framework</strong> with the <code>serverless-offline</code> plugin, or <strong>LocalStack</strong> for comprehensive local AWS service emulation. <strong>Docker</strong> is essential for containerizing your development environment to match Lambda&#8217;s execution environment.</p><p>Local development environments should replicate the Lambda execution environment as closely as possible using <strong>Docker images based on AWS Lambda base images</strong>. This includes matching the runtime version, environment variables, memory configuration, and timeout settings. Discrepancies between local and deployed environments are a common source of bugs that only appear in production.</p><p>Consider using <strong>Docker Compose</strong> to orchestrate containerized development environments that mirror the Lambda runtime. This approach ensures consistency across your development team and reduces &#8220;works on my machine&#8221; issues. Containers can include all necessary dependencies, tools, and configurations, making onboarding new developers faster and more reliable.</p><p><strong>Debugging Tools</strong>: Local development should support debugging with breakpoints and step-through execution using <strong>VS Code&#8217;s built-in debugger</strong>, <strong>AWS Toolkit for VS Code</strong>, <strong>IntelliJ IDEA with AWS Toolkit</strong>, or <strong>PyCharm&#8217;s remote debugging</strong>. The ability to pause execution, inspect variables, and step through code line by line is invaluable for understanding complex issues. This capability is especially important when working with asynchronous operations and event-driven workflows.</p><h2>Conclusion</h2><p>Testing serverless applications requires a comprehensive strategy that spans unit testing, integration testing, local development, and production monitoring. The distributed nature of serverless architectures and the tight integration with managed services make testing more complex than traditional applications, but the right tools and approach can provide confidence in your application&#8217;s reliability and correctness.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[AWS Lambda Durable Functions: What They Are and When to Use Them]]></title><description><![CDATA[AWS Lambda Durable Functions represent a significant evolution in serverless application development, introduced at AWS re:Invent 2025.]]></description><link>https://blog.thecloudengineers.com/p/aws-lambda-durable-functions-what</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/aws-lambda-durable-functions-what</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 18 Mar 2026 10:30:37 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/74bda787-af1c-4ac2-ae46-475bb18c3b27_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS Lambda Durable Functions represent a significant evolution in serverless application development, introduced at AWS re:Invent 2025. This new capability enables developers to build reliable, multi-step applications, that can run for extended periods, without paying for idle compute time while waiting for external events or human decisions.</p><p>In this article, we'll cover what they are, when you should use them, and how they compare with AWS Step Functions to help you choose the right orchestration tool for your serverless applications.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>What Are Lambda Durable Functions?</h2><p>Lambda Durable Functions are regular Lambda functions enhanced with stateful execution capabilities. They allow you to write complex, long-running workflows directly in your application code using familiar programming languages, rather than defining workflows in a separate orchestration language.</p><p>The key innovation lies in <strong>automatic checkpointing</strong>. As your function executes, AWS Lambda automatically saves progress at strategic points. If your workflow needs to wait for an external event, such as a webhook callback, a manual approval, or a scheduled delay, the function pauses without consuming compute resources. When the awaited event occurs, execution resumes from the last checkpoint with full context preserved.</p><p>This approach embeds orchestration logic directly into your Lambda code, eliminating the need to manage state manually or architect separate state management systems. The result is a serverless-first approach that reduces operational overhead while maintaining the scalability and pay-per-use economics that make serverless attractive.</p><h2>When to Use Durable Functions</h2><p>Lambda Durable Functions excel in specific scenarios where application logic naturally involves multiple steps with waiting periods:</p><p><strong>AI and Machine Learning Workflows</strong> - Orchestrating multi-stage AI pipelines where each step might involve model inference, data transformation, or human review. The ability to pause between stages without incurring costs makes this particularly economical for batch processing scenarios.</p><p><strong>Order Processing and E-commerce</strong> - Managing order fulfillment workflows that span inventory checks, payment processing, shipping coordination, and customer notifications. These workflows often involve waiting for external system responses or scheduled delivery windows.</p><p><strong>Approval and Human-in-the-Loop Processes</strong> - Building loan applications, expense approvals, or content moderation systems where workflows must pause for human decisions before proceeding. The year-long execution window accommodates even the most extended approval cycles.</p><p><strong>Event-Driven Application Logic</strong> - Coordinating complex business processes that respond to multiple events over time, such as customer onboarding journeys, subscription lifecycle management, or multi-step data processing pipelines.</p><p><strong>Application-Centric Orchestration</strong> - When your workflow logic is tightly coupled with your application code and benefits from being expressed in the same programming language as the rest of your business logic.</p><h2>How Durable Functions Compare with Step Functions</h2><p>Both Lambda Durable Functions and AWS Step Functions solve the problem of orchestrating multi-step workflows, but they approach it from fundamentally different philosophical and architectural standpoints.</p><p><strong>Architectural Philosophy</strong></p><p>Step Functions is built for <strong>infrastructure orchestration</strong>. It excels at coordinating disparate AWS services, providing a visual workflow designer, and offering a declarative approach using Amazon States Language (ASL). The workflow definition lives separately from your application code, making it ideal for workflows that need to be understood and modified by operations teams or non-technical stakeholders.</p><p>Durable Functions, by contrast, are optimized for <strong>application orchestration</strong>. The workflow logic lives within your Lambda function code, written in your preferred programming language. This code-first approach makes it natural for developers who want to keep orchestration logic alongside business logic, use familiar debugging tools, and leverage existing programming constructs like loops, conditionals, and error handling.</p><p><strong>Developer Experience</strong></p><p>With Step Functions, you define workflows in ASL&#8212;a JSON-based state machine definition language. While powerful and expressive, this requires learning a new syntax and switching contexts between your application code and workflow definitions. Local testing typically involves additional tooling or mocking frameworks.</p><p>Durable Functions let you write workflows as regular code. If you know how to write a Lambda function, you already know how to write a durable function. The workflow logic uses standard programming constructs, making it more intuitive for developers and easier to test locally using familiar development tools.</p><p><strong>Visibility and Monitoring</strong></p><p>Step Functions provides superior visual representation of workflows. The built-in graphical interface shows execution flow, making it easy for stakeholders to understand complex processes at a glance. This visualization is particularly valuable for compliance, auditing, and operational monitoring.</p><p>Durable Functions trade some of this visual clarity for code simplicity. While you can still monitor executions through CloudWatch and Lambda insights, the workflow structure isn&#8217;t as immediately apparent to non-technical observers.</p><p><strong>Use Case Alignment</strong></p><p>Choose <strong>Step Functions</strong> when you need to:</p><ul><li><p>Orchestrate multiple AWS services (Lambda, ECS, Glue, etc.)</p></li><li><p>Provide workflow visibility to non-technical stakeholders</p></li><li><p>Build workflows that benefit from visual design and monitoring</p></li><li><p>Implement complex branching, parallel execution, or error handling patterns that benefit from declarative definition</p></li><li><p>Maintain clear separation between orchestration and business logic</p></li></ul><p>Choose <strong>Durable Functions</strong> when you need to:</p><ul><li><p>Keep workflow logic tightly integrated with application code</p></li><li><p>Leverage existing programming language features and libraries</p></li><li><p>Simplify development by avoiding context switching between code and ASL</p></li><li><p>Build workflows where the orchestration is primarily application logic rather than service coordination</p></li><li><p>Optimize for developer velocity and code maintainability within Lambda</p></li></ul><p><strong>Cost Considerations</strong></p><p>Both services charge only for active execution time, not idle waiting periods. However, the pricing models differ slightly. Step Functions charges per state transition, while Durable Functions follow Lambda&#8217;s standard pricing model with additional charges for state persistence. For workflows with many state transitions, Durable Functions may offer cost advantages. For workflows coordinating many AWS services, Step Functions&#8217; integration capabilities may provide better value.</p><h2>Conclusion</h2><p>Lambda Durable Functions and Step Functions aren&#8217;t competitors&#8212;they&#8217;re complementary tools in your serverless toolkit. Durable Functions shine when you want to write workflow logic as code within your Lambda functions, keeping orchestration close to your application logic. Step Functions excel when you need to coordinate multiple AWS services with clear visual representation and operational oversight.</p><p>The choice ultimately depends on your team&#8217;s preferences, the nature of your workflows, and whether you value code-centric development or service-centric orchestration. Many organizations will find themselves using both, selecting the right tool for each specific use case.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Accelerating CloudFormation and CDK Development with Kiro]]></title><description><![CDATA[If you&#8217;ve spent any meaningful time building on AWS, you know the drill: you open your IDE, stare at a blank CDK stack or an empty CloudFormation template, and then spend the next 20 minutes hunting through documentation tabs, Stack Overflow threads, and GitHub samples just to get started.]]></description><link>https://blog.thecloudengineers.com/p/accelerating-cloudformation-and-cdk</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/accelerating-cloudformation-and-cdk</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 11 Mar 2026 10:30:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/61662d6c-d61a-43a2-a5a1-fd19913966b0_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;ve spent any meaningful time building on AWS, you know the drill: you open your IDE, stare at a blank CDK stack or an empty CloudFormation template, and then spend the next 20 minutes hunting through documentation tabs, Stack Overflow threads, and GitHub samples just to get started. It&#8217;s not that the tools are bad, CDK and CloudFormation are genuinely powerful. It&#8217;s that the cognitive overhead of <em>using</em> them well is enormous.</p><p>That&#8217;s exactly the problem Kiro is designed to solve.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What Is Kiro?</h2><p><a href="https://kiro.dev">Kiro</a> is an AI-powered IDE built for developers who work with cloud infrastructure. It&#8217;s not just a code editor with a chatbot bolted on, it&#8217;s a development environment that understands your infrastructure context and actively helps you build, validate, and troubleshoot it.</p><p>One of Kiro&#8217;s most compelling features is its <strong>Powers</strong> system. Powers are purpose-built AI capabilities that you can install directly into your IDE, each one tailored to a specific domain or technology. Think of them as expert co-pilots for different parts of your stack.</p><p>And for AWS engineers, there&#8217;s one Power that deserves your full attention.</p><h2>The &#8220;Build AWS Infrastructure with CDK and CloudFormation&#8221; Power</h2><p>Available at <a href="https://kiro.dev/powers/">kiro.dev/powers</a>, this Power is built specifically for AWS infrastructure development. Its description is straightforward but the implications are significant:</p><blockquote><p><em>&#8220;Build well-architected AWS infrastructure with CDK using latest documentation, best practices, and code samples. Validate CloudFormation templates, check resource configuration security compliance, and troubleshoot deployments.&#8221;</em></p></blockquote><p>This isn&#8217;t a generic AI assistant that happens to know some AWS. This is a focused, infrastructure-aware capability that integrates directly into your development workflow.</p><h2>What This Power Actually Does</h2><p>Under the hood, this Kiro Power is backed by the <strong>AWS Infrastructure as Code (IaC) MCP Server, </strong>a tool built on the Model Context Protocol (MCP) that connects AI assistants to your local AWS development environment.</p><p>The Power gives Kiro access to a set of specialized tools organized into two categories:</p><p><strong>Documentation and Knowledge Tools</strong></p><ul><li><p>Search CDK documentation and best practices</p></li><li><p>Read specific CDK documentation pages</p></li><li><p>Search CDK samples and constructs</p></li><li><p>Search CloudFormation documentation</p></li></ul><p><strong>Validation and Troubleshooting Tools</strong></p><ul><li><p>Validate CloudFormation templates before deployment</p></li><li><p>Check resource configuration for security compliance</p></li><li><p>Get pre-deployment validation instructions</p></li><li><p>Troubleshoot CloudFormation deployment failures</p></li></ul><p>Together, these capabilities cover the full lifecycle of infrastructure development, from the moment you start designing a stack to the moment you&#8217;re debugging a failed deployment.</p><h2>Why This Changes Your CDK and CloudFormation Workflow</h2><h3><strong>You Stop Guessing at Best Practices</strong></h3><p>One of the most common pain points with CDK is knowing <em>how</em> to use it correctly, not just <em>that</em> you can use it. Should you use a <code>Construct</code> or an <code>L2 Construct</code>? What&#8217;s the right pattern for a serverless API? What are the security defaults you should always set?</p><p>With this Power active, Kiro can answer these questions in context, pulling from the latest AWS documentation and surfacing the right patterns for your specific use case. You&#8217;re not getting generic advice; you&#8217;re getting guidance grounded in current AWS best practices.</p><h3><strong>You Catch Errors Before They Reach AWS</strong></h3><p>CloudFormation template errors are notoriously painful to debug. You deploy, wait several minutes, and then get a cryptic rollback message. With Kiro&#8217;s validation tools, you can catch configuration issues and security compliance gaps <em>before</em> you ever run <code>cdk deploy</code> or submit a stack.</p><p>This alone can save significant time in any infrastructure development cycle.</p><h3><strong>You Troubleshoot Deployments Faster</strong></h3><p>When deployments do fail, Kiro can help you understand why. Rather than manually cross-referencing CloudTrail logs and CloudFormation events, you can ask Kiro directly, and it will surface the relevant error context and suggest a fix. For example, if CloudTrail shows an <code>AccessDenied</code> error for a specific API call, Kiro can identify the missing permission and tell you exactly what needs to change in your deployment role.</p><h3><strong>You Learn While You Build</strong></h3><p>If you&#8217;re newer to CDK or expanding into unfamiliar AWS services, this Power doubles as a learning accelerator. You can ask Kiro to show you how to build a specific architecture pattern, and it will search CDK constructs and samples to give you multiple implementation approaches, not just one answer, but a range of options with context.</p><h2>Security and Local Execution</h2><p>A reasonable concern with any AI-assisted development tool is: <em>where does my code go?</em></p><p>The IaC MCP Server that powers this Kiro capability runs <strong>entirely on your local machine</strong>. Your CloudFormation templates and CDK code are not sent to external services. The only external calls made are for documentation searches. AWS credentials are used from your existing local configuration, the same way the AWS CLI works, and communication between the server and Kiro happens over standard input/output with no network ports opened.</p><p>For teams working in regulated environments or with sensitive infrastructure, this local execution model is a meaningful architectural choice.</p><h2>Getting Started</h2><p>Installing the Power is straightforward from within the Kiro IDE. Navigate to the <strong>Powers</strong> panel in the sidebar, find &#8220;Build AWS infrastructure with CDK and CloudFormation&#8221; under the Recommended section, and install it. Once active, Kiro gains access to all the CDK and CloudFormation tools described above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gKuA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gKuA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 424w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 848w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gKuA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png" width="1456" height="999" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:999,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:250935,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/189967684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gKuA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 424w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 848w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Bigger Picture</h2><p>CDK and CloudFormation are not going away. If anything, infrastructure as code is becoming more central to how teams build and operate on AWS, not less. The question isn&#8217;t whether you&#8217;ll use these tools; it&#8217;s how efficiently you&#8217;ll use them.</p><p>Kiro&#8217;s CDK and CloudFormation Power represents a meaningful shift in that equation. It brings documentation, validation, compliance checking, and troubleshooting directly into your development environment, reducing the context-switching that slows down infrastructure work and helping you build well-architected systems faster.</p><p>If you&#8217;re serious about AWS infrastructure development, this is worth adding to your workflow.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[When NOT to Go Serverless: 5 Use Cases Where It’s the Wrong Choice]]></title><description><![CDATA[Serverless is one of the most transformative paradigms in cloud computing.]]></description><link>https://blog.thecloudengineers.com/p/when-not-to-go-serverless-5-use-cases</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/when-not-to-go-serverless-5-use-cases</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 04 Mar 2026 10:30:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/255b3b3c-480f-4606-b519-65b7ef22ad5e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Serverless is one of the most transformative paradigms in cloud computing. AWS Lambda, API Gateway, DynamoDB, and the broader serverless ecosystem let you build and scale applications without ever thinking about servers. It&#8217;s fast to develop, often cheap to run, and incredibly powerful.</p><p>But here&#8217;s the thing: <strong>serverless is not a silver bullet.</strong></p><p>After years of building on AWS, I&#8217;ve seen teams adopt serverless for everything &#8212; and sometimes pay the price in performance, cost, complexity, or all three. The best architects don&#8217;t just know <em>when</em> to use a tool. They know <em>when not to</em>.</p><p>In this article, I&#8217;ll walk you through <strong>five real-world use cases where going serverless is the wrong choice</strong>, and what you should consider instead.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>1. Long-Running, Compute-Heavy Workloads</h2><h3>The Problem</h3><p>AWS Lambda has a <strong>hard timeout of 15 minutes</strong>. If your workload takes longer than that, Lambda simply isn&#8217;t an option. But even workloads that <em>fit</em> within 15 minutes can be a poor match.</p><p>Think about:</p><ul><li><p><strong>Video transcoding or rendering</strong></p></li><li><p><strong>Large-scale data transformations (ETL)</strong></p></li><li><p><strong>Machine learning model training</strong></p></li><li><p><strong>Complex PDF generation from massive datasets</strong></p></li></ul><p>These tasks are CPU-intensive and often run for extended periods. Shoehorning them into Lambda means you&#8217;ll spend your time fighting the platform instead of leveraging it.</p><h3>Why Serverless Fails Here</h3><ul><li><p><strong>15-minute max execution time</strong> is a hard ceiling you can&#8217;t negotiate.</p></li><li><p><strong>CPU is allocated proportionally to memory</strong> in Lambda. To get meaningful compute power, you might need to allocate 4&#8211;10 GB of memory &#8212; even if you only need 512 MB of RAM. You&#8217;re paying for memory you don&#8217;t use just to get CPU.</p></li><li><p><strong>No GPU access.</strong> If your workload benefits from GPU acceleration, Lambda is out of the question.</p></li><li><p>Orchestrating many short Lambda functions to chunk the work adds <strong>significant complexity</strong> and potential failure points.</p></li></ul><h3>What to Use Instead</h3><ul><li><p><strong>AWS Fargate (ECS) </strong>&#8212;<strong> </strong>Containerized long-running tasks without managing servers</p></li><li><p><strong>EC2 Spot Instances </strong>&#8212;<strong> </strong>Cost-effective heavy compute (batch jobs, rendering)</p></li><li><p><strong>AWS Batch </strong>&#8212;<strong> </strong>Managed batch processing at scale</p></li><li><p><strong>SageMaker </strong>&#8212;<strong> </strong>ML model training specifically</p></li><li><p><strong>Step Functions + Fargate </strong>&#8212;<strong> </strong>Orchestrated pipelines with long-running steps</p></li></ul><h3>A Practical Example</h3><p>Let&#8217;s say you&#8217;re processing uploaded video files. Instead of Lambda:</p><p><code>S3 Upload Event &#8594; EventBridge &#8594; Fargate Task (ECS)</code></p><p>Fargate gives you configurable vCPU and memory, no time limits, and you only pay for the duration of the task. You still get the &#8220;no servers to manage&#8221; benefit without Lambda&#8217;s constraints.</p><h2>2. High-Throughput, Latency-Sensitive Applications</h2><h3>The Problem</h3><p>You&#8217;re building an application that needs to handle <strong>thousands of requests per second</strong> with <strong>consistent, single-digit millisecond response times</strong>. Think:</p><ul><li><p><strong>Real-time bidding platforms</strong></p></li><li><p><strong>High-frequency trading APIs</strong></p></li><li><p><strong>Multiplayer game backends</strong></p></li><li><p><strong>Real-time fraud detection at the edge</strong></p></li></ul><h3>Why Serverless Fails Here</h3><p>The biggest villain here is the <strong>cold start</strong>.</p><p>When Lambda spins up a new execution environment, there&#8217;s an initialization delay.</p><ul><li><p>Python, Node.js: 100&#8211;300 ms</p></li><li><p>Java, .NET: 500 ms &#8211; 2+ seconds</p></li></ul><p>For most applications, cold starts are a minor nuisance. For latency-critical systems, they&#8217;re a dealbreaker.</p><p>Yes, <strong>Provisioned Concurrency</strong> exists, and <strong>SnapStart</strong> helps with Java. But:</p><ul><li><p>Provisioned Concurrency is essentially <strong>pre-warming containers</strong> &#8212; which defeats the core serverless value proposition. You&#8217;re paying for idle compute.</p></li><li><p>You still can&#8217;t guarantee <strong>tail latency</strong> (p99/p99.9) the way you can with a warm, long-running process.</p></li><li><p>At high throughput, the <strong>API Gateway overhead</strong> (typically 10&#8211;30 ms) adds up and is hard to eliminate.</p></li></ul><h3>What to Use Instead</h3><ul><li><p><strong>ECS/EKS on Fargate or EC2</strong> &#8212; Long-running containers with persistent connections and predictable latency.</p></li><li><p><strong>EC2 with Auto Scaling</strong> &#8212; Maximum control over the runtime, networking, and performance tuning.</p></li><li><p><strong>ElastiCache (Redis)</strong> &#8212; Pair with containers for ultra-low-latency data access.</p></li></ul><h3>The Key Takeaway</h3><p>If your SLA says <strong>&#8220;p99 latency under 10 ms&#8221;</strong>, serverless will make that extremely difficult and expensive to achieve. Use always-on compute with connection pooling and warm caches.</p><h2>3. Applications with Complex or Stateful Workflows (That Need Persistent Connections)</h2><h3>The Problem</h3><p>Serverless functions are <strong>stateless and ephemeral</strong> by design. Each invocation is independent. There&#8217;s no shared memory, no local disk that persists, and no guarantee that the same instance handles consecutive requests.</p><p>This becomes painful when you need:</p><ul><li><p><strong>WebSocket connections with complex server-side state</strong></p></li><li><p><strong>Long-lived database connection pools</strong></p></li><li><p><strong>In-memory caching across requests</strong></p></li><li><p><strong>Sticky sessions</strong></p></li></ul><h3>Why Serverless Fails Here</h3><p><strong>Database connections are the classic pain point.</strong> Every Lambda invocation may open a new connection to your relational database. Under load, you can easily exhaust your database&#8217;s connection limit.</p><pre><code><code>100 concurrent Lambda invocations = 100 database connections
1,000 concurrent invocations = &#128165; database connection limit exceeded</code></code></pre><p><strong>RDS Proxy</strong> helps by pooling connections, but it:</p><ul><li><p>Adds cost</p></li><li><p>Adds latency (~5&#8211;15 ms per query overhead)</p></li><li><p>Is another component to configure and monitor</p></li><li><p>Doesn&#8217;t eliminate the fundamental mismatch &#8212; it just papers over it</p></li></ul><p><strong>In-memory state is another problem.</strong> If your application logic depends on keeping data in memory across requests (e.g., a game server tracking player positions, a collaborative editing backend), Lambda can&#8217;t help you. You&#8217;d have to externalize all state to DynamoDB, ElastiCache, or S3 &#8212; adding latency and complexity for every operation.</p><h3>What to Use Instead</h3><ul><li><p><strong>ECS/Fargate</strong> with long-running containers that maintain connection pools and in-memory state.</p></li><li><p><strong>EC2</strong> for maximum control over memory, connections, and runtime behavior.</p></li><li><p><strong>ElastiCache</strong> if you need shared in-memory state across instances.</p></li><li><p><strong>AppSync</strong> for managed GraphQL with real-time subscriptions (a serverless option that <em>does</em> handle WebSockets well, but for a specific use case).</p></li></ul><h3>When It&#8217;s a Judgment Call</h3><p>If you&#8217;re building a standard REST API with a relational database, serverless <em>can</em> work &#8212; especially with RDS Proxy and moderate traffic. But if your application is <strong>connection-heavy or state-heavy</strong>, you&#8217;ll spend more time fighting the architecture than building features.</p><h2>4. Predictable, Always-On, Steady-State Workloads</h2><h3>The Problem</h3><p>Serverless pricing is brilliant for <strong>spiky, unpredictable, or low-traffic workloads</strong>. You pay per invocation, per millisecond of compute. When traffic is zero, your bill is zero.</p><p>But what about workloads that run <strong>24/7 at a consistent, high volume</strong>?</p><ul><li><p>A backend API serving <strong>millions of requests per day, evenly distributed</strong></p></li><li><p>A stream processor consuming from Kinesis <strong>around the clock</strong></p></li><li><p>A WebSocket server with <strong>thousands of persistent connections</strong></p></li></ul><h3>Why Serverless Fails Here</h3><p>Let&#8217;s do some back-of-the-envelope math.</p><p><strong>Scenario:</strong> An API handling 50 million requests/month, average execution time of 200 ms, with 512 MB memory allocated.</p><p><strong>Lambda Cost:</strong></p><pre><code><code>Compute: 50M &#215; 0.2s &#215; 0.5 GB = 5,000,000 GB-seconds
Cost:    5,000,000 &#215; $0.0000166667 = ~$83.33
Requests: 50M &#215; $0.20/million     = ~$10.00
API Gateway: 50M &#215; $1.00/million  = ~$50.00
                                    --------
Total:                              ~$143/month</code></code></pre><p><strong>Fargate Equivalent (2 tasks, 0.5 vCPU, 1 GB each):</strong></p><pre><code><code>vCPU:   2 &#215; 0.5 &#215; $0.04048/hr &#215; 730 hrs = ~$29.55
Memory: 2 &#215; 1 GB &#215; $0.004445/hr &#215; 730 hrs = ~$6.49
ALB:    ~$16.20 + LCU charges (~$5-10)     = ~$25.00
                                              --------
Total:                                       ~$61/month</code></code></pre><p><strong>That&#8217;s a ~57% cost reduction with Fargate</strong> &#8212; and the gap widens as traffic increases.</p><h3>The Rule of Thumb</h3><p>Ask yourself:</p><blockquote><p><em><strong>&#8220;Will my compute utilization be consistently above 20-30%?&#8221;</strong></em></p></blockquote><p>If yes, reserved/always-on compute (Fargate, EC2 Reserved Instances, or Savings Plans) will almost always be cheaper.</p><h3>What to Use Instead</h3><ul><li><p><strong>Fargate with Auto Scaling </strong>&#8212; Steady containerized workloads with occasional spikes</p></li><li><p><strong>EC2 Reserved Instances / Savings Plans </strong>&#8212; Predictable, long-term workloads at the best price</p></li><li><p><strong>EKS </strong>&#8212; When you need Kubernetes orchestration at scale</p></li></ul><h3>The Hybrid Approach</h3><p>The best architectures often <strong>mix serverless and containers</strong>:</p><ul><li><p>Use <strong>Lambda</strong> for event-driven, asynchronous tasks (S3 triggers, SNS handlers, cron jobs).</p></li><li><p>Use <strong>Fargate/EC2</strong> for the high-throughput, steady-state API layer.</p></li></ul><p>You get the best of both worlds.</p><h2>5. Complex Local Development and Debugging Requirements</h2><h3>The Problem</h3><p>This one is less about production and more about <strong>developer experience</strong> &#8212; which matters more than many teams admit.</p><p>Serverless applications are inherently <strong>distributed</strong>. A single user action might involve:</p><p><code>API Gateway &#8594; Lambda &#8594; DynamoDB &#8594; DynamoDB Stream &#8594; Lambda &#8594; SNS &#8594; Lambda &#8594; SQS &#8594; Lambda</code></p><p>Try debugging that locally.</p><h3>Why Serverless Fails Here</h3><ul><li><p><strong>Local emulation is imperfect.</strong> Tools like SAM CLI and LocalStack are helpful but don&#8217;t perfectly replicate IAM permissions, event source mappings, service integrations, or edge cases.</p></li><li><p><strong>&#8220;Console log&#8221; debugging</strong> is still the primary debugging strategy for many Lambda developers. Attaching a real debugger with breakpoints is possible but cumbersome.</p></li><li><p><strong>Integration testing is hard.</strong> You often need a deployed AWS environment to truly test your application, which slows down the feedback loop.</p></li><li><p><strong>Distributed tracing adds overhead.</strong> You&#8217;ll need X-Ray or third-party tools to understand request flows across services.</p></li><li><p><strong>Each developer needs their own cloud sandbox.</strong> Unlike a containerized app where you can <code>docker-compose up</code> and have a full environment locally, serverless often requires deploying to AWS &#8212; leading to shared-environment conflicts or the cost of per-developer AWS accounts.</p></li></ul><h3>When This Matters Most</h3><ul><li><p><strong>Large teams</strong> where developer productivity at scale is critical</p></li><li><p><strong>Complex, multi-service architectures</strong> with many interconnected functions</p></li><li><p><strong>Regulated industries</strong> where you need to reproduce and debug production issues quickly</p></li><li><p><strong>Teams new to serverless</strong> with existing expertise in containers or traditional architectures</p></li></ul><h3>What to Use Instead</h3><p>If developer experience is a priority and your team is more productive with containers:</p><ul><li><p><strong>Docker Compose + ECS/Fargate</strong> &#8212; Develop locally with Docker, deploy to Fargate. Nearly identical environments.</p></li><li><p><strong>EKS with Skaffold or Telepresence</strong> &#8212; Hot-reload Kubernetes development.</p></li></ul><h3>A Balanced View</h3><p>To be fair, the serverless developer experience is <strong>improving rapidly</strong>. Tools like <strong>SST (Serverless Stack)</strong>, <strong>AWS SAM Accelerate</strong>, and <strong>Lambda Powertools</strong> have made development significantly better. If you&#8217;re starting a greenfield project with a small team, serverless DX is often <em>good enough</em>.</p><p>But for large, complex systems &#8212; especially if your team has deep container expertise &#8212; forcing a serverless-first approach can meaningfully slow development velocity.</p><h2>So... When SHOULD You Go Serverless?</h2><p>After all that, let&#8217;s end on a positive note. Serverless is the <strong>right choice</strong> when:</p><ol><li><p>Your traffic is <strong>spiky, unpredictable, or low-volume</strong></p></li><li><p>You want to <strong>minimize operational overhead</strong> (no patching, no capacity planning)</p></li><li><p>You&#8217;re building <strong>event-driven architectures</strong> (S3 triggers, message processing, webhooks)</p></li><li><p>You need to <strong>ship fast</strong> with a small team</p></li><li><p>Your workloads are <strong>short-lived</strong> (seconds, not minutes)</p></li><li><p>You want <strong>automatic scaling</strong> from zero to massive without configuration</p></li><li><p>You&#8217;re building <strong>microservices, APIs, or background jobs</strong> with moderate latency requirements</p></li></ol><h2>The Bottom Line</h2><p>The best cloud architects aren&#8217;t loyal to any single paradigm. They pick the right tool for the job.</p><p>Serverless is incredible &#8212; I use it extensively and recommend it often. But blindly applying it to every problem leads to over-engineered, expensive, and frustrating architectures.</p><p><strong>Before going serverless, ask yourself:</strong></p><ol><li><p>Does my workload fit within Lambda&#8217;s execution limits?</p></li><li><p>Can I tolerate cold start latency?</p></li><li><p>Is my application fundamentally stateless?</p></li><li><p>Is my traffic pattern spiky or variable?</p></li><li><p>Can my team debug and develop effectively in a serverless model?</p></li></ol><p>If the answer to any of these is <strong>&#8220;no&#8221;</strong>, consider containers, EC2, or a hybrid approach. There&#8217;s no shame in using a server. After all, serverless still runs on them.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Securing Your AWS Account in the First 24 Hours (A Checklist)]]></title><description><![CDATA[You just created a new AWS account.]]></description><link>https://blog.thecloudengineers.com/p/securing-your-aws-account-in-the</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/securing-your-aws-account-in-the</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 25 Feb 2026 10:31:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/28b5c701-ea1a-49ec-a505-6f3b387f2420_2912x1632.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You just created a new AWS account. Maybe it&#8217;s for a personal project, a startup, or a new workload at your company. Whatever the reason, the clock is ticking. An unsecured AWS account is a magnet for attackers, and the damage can happen fast, from cryptomining charges to full data breaches.</p><p>The good news? You can dramatically reduce your risk by following a clear set of steps in the first 24 hours. This article is that checklist.</p><p>Let&#8217;s get started.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why the First 24 Hours Matter</h2><p>New AWS accounts are often in their most vulnerable state right after creation. Default settings are permissive, there&#8217;s no monitoring in place, and credentials are frequently mishandled. Automated bots constantly scan for exposed keys and misconfigured resources. Stories of developers waking up to five-figure AWS bills after accidentally committing credentials to GitHub are not myths, they happen regularly.</p><p>Treating security as a &#8220;day one&#8221; priority is not paranoia. It&#8217;s engineering discipline.</p><h2>Phase 1: Protect the Root Account</h2><p>The root account is the single most powerful identity in your AWS environment. It has unrestricted access to everything, and it cannot be limited by IAM policies. Your first actions should be to lock it down.</p><h3>1. Set a Strong, Unique Password for the Root User</h3><p>The password you chose during sign-up might have been rushed. Go to the account settings and make sure the root password is long, random, and stored in a reputable password manager. Do not reuse a password from another service.</p><h3>2. Enable Multi-Factor Authentication (MFA) on the Root Account</h3><p>This is the single most important step on this entire list. Go to the IAM console, select the root user, and enable MFA. A hardware security key (such as a YubiKey) is the strongest option. A virtual MFA app like Google Authenticator or Authy is a solid alternative. Avoid SMS-based MFA if possible, as it is vulnerable to SIM-swapping attacks.</p><h3>3. Do Not Generate Access Keys for the Root Account</h3><p>The root user should never have programmatic access keys. If keys already exist for the root account, delete them immediately. There is no legitimate operational reason to use root access keys.</p><h3>4. Store Root Credentials Securely and Rarely Use Them</h3><p>After setting up the root account, you should almost never log into it again. Store the credentials and MFA recovery codes in a secure vault (a physical safe or an encrypted password manager with restricted access). The root account should only be used for the handful of tasks that specifically require it, such as changing the account&#8217;s support plan or closing the account.</p><h2>Phase 2: Set Up Proper IAM Identity Management</h2><p>Now that root is locked away, you need a secure way for humans and services to interact with the account.</p><h3>5. Create an IAM Admin User or Use AWS IAM Identity Center</h3><p>Create a dedicated IAM user (or, even better, use IAM Identity Center if you&#8217;re working within AWS Organizations) with administrative permissions. This is the identity you&#8217;ll use for day-to-day management &#8212; not root.</p><p>If you&#8217;re managing a single account, an IAM user in the &#8220;Administrators&#8221; group is sufficient. If you&#8217;re managing multiple accounts, invest the time to set up IAM Identity Center (formerly AWS SSO) from the start. It will save you enormous headaches later.</p><h3>6. Enable MFA for All Human IAM Users</h3><p>Every person who can log into the AWS console should have MFA enabled. No exceptions. Consider creating an IAM policy that denies all actions until MFA is activated, which encourages users to set it up immediately upon first login.</p><h3>7. Apply the Principle of Least Privilege</h3><p>Resist the temptation to give every user or role full admin access. Start with the minimum permissions required for each person or service, and expand only when there is a justified need. Use AWS managed policies as a starting point, but review what they allow. The &#8220;PowerUserAccess&#8221; policy, for example, is far more permissive than most people realize.</p><h3>8. Create an IAM Password Policy</h3><p>Configure a strong password policy for IAM users in your account. Enforce minimum length (at least 14 characters), require a mix of character types, and enable password expiration if your organization requires it. This is found under IAM &gt; Account settings.</p><h2>Phase 3: Enable Logging and Monitoring</h2><p>You can&#8217;t protect what you can&#8217;t see. Visibility into account activity is non-negotiable.</p><h3>9. Enable AWS CloudTrail in All Regions</h3><p>CloudTrail records every API call made in your account. It&#8217;s your audit trail, your forensic evidence, and your early warning system. Create a trail that applies to all regions and logs to a dedicated S3 bucket. Enable log file validation so you can verify that logs haven&#8217;t been tampered with.</p><p>This is one of those services that costs very little but provides enormous value. Do not skip it.</p><h3>10. Enable CloudTrail Log File Integrity Validation</h3><p>When you set up CloudTrail, enable digest file delivery. This allows you to later verify that no log files were modified or deleted after delivery. In the event of a security incident, you need to trust your logs.</p><h3>11. Turn On AWS Config</h3><p>AWS Config continuously monitors and records your resource configurations and evaluates them against rules. Enable it in all regions. It answers questions like: &#8220;When did this security group change?&#8221; and &#8220;Is this S3 bucket publicly accessible?&#8221;</p><p>You can start with the AWS Config conformance packs, which provide pre-built rule sets for common compliance frameworks.</p><h3>12. Enable Amazon GuardDuty</h3><p>GuardDuty is a threat detection service that analyzes CloudTrail logs, VPC flow logs, and DNS logs to identify malicious activity. It can detect things like cryptocurrency mining, compromised credentials, and unusual API calls. Enabling it takes about 30 seconds, and it starts working immediately.</p><p>There is a 30-day free trial, and the ongoing cost is typically very reasonable relative to the protection it provides.</p><h3>13. Set Up Billing Alerts and Anomaly Detection</h3><p>Security incidents often show up as billing anomalies first. Go to the Billing and Cost Management console and enable the following:</p><ul><li><p><strong>AWS Budgets</strong>: Create a monthly budget with a threshold alert. Even a simple &#8220;notify me if my spend exceeds $50&#8221; can save you from a surprise bill.</p></li><li><p><strong>Cost Anomaly Detection</strong>: This service uses machine learning to detect unusual spending patterns and alert you automatically.</p></li></ul><p>An unexpected spike in EC2 or Lambda charges could be the first sign that someone is using your account for unauthorized purposes.</p><h2>Phase 4: Secure Your Network Defaults</h2><p>Every new AWS account comes with default networking resources that are more permissive than you might want.</p><h3>14. Review and Restrict Default Security Groups</h3><p>Every VPC comes with a default security group that allows all outbound traffic and allows all inbound traffic from resources assigned to the same security group. While this isn&#8217;t publicly open, it&#8217;s more permissive than most workloads need. Adopt the practice of creating custom, purpose-specific security groups and never attaching the default security group to resources.</p><h3>15. Delete or Ignore the Default VPC (Use Custom VPCs Instead)</h3><p>The default VPC in each region comes with public subnets and an internet gateway already attached. It&#8217;s designed for convenience, not security. For any real workload, create a custom VPC with a deliberate network design &#8212; private subnets for backend services, public subnets only where necessary, and proper route table configuration.</p><p>You don&#8217;t necessarily need to delete the default VPC (some AWS services expect it to exist), but make a rule to never deploy production workloads into it.</p><h3>16. Block Public Access to S3 at the Account Level</h3><p>One of the most common AWS security headlines involves publicly exposed S3 buckets. AWS now provides an account-level setting called &#8220;S3 Block Public Access.&#8221; Enable all four options at the account level. If a specific bucket later needs public access (for static website hosting, for example), you can override it deliberately for that one bucket. The default should be locked down.</p><h2>Phase 5: Establish Account-Level Protections</h2><p>These steps create safety nets that protect you across the entire account.</p><h3>17. Enable EBS Default Encryption</h3><p>Navigate to the EC2 console and enable default EBS encryption for each region you plan to use. This ensures that every new EBS volume is encrypted automatically, without requiring the person launching the instance to remember to check a box. You can use the AWS-managed key or specify a customer-managed KMS key.</p><h3>18. Configure Account-Level Contacts</h3><p>Under Account Settings, make sure the security, billing, and operations contact fields are filled in with valid email addresses. AWS uses these contacts to reach you in the event of abuse notifications or security issues. If these emails go to an unmonitored inbox, you might miss a critical warning.</p><h3>19. Enable AWS Security Hub (Optional but Recommended)</h3><p>Security Hub aggregates findings from GuardDuty, Config, IAM Access Analyzer, and other services into a single dashboard. It also runs automated security checks against industry benchmarks like CIS AWS Foundations. If you&#8217;ve already enabled the services listed above, Security Hub ties everything together and gives you a centralized security posture score.</p><h3>20. Enable IAM Access Analyzer</h3><p>IAM Access Analyzer helps you identify resources in your account that are shared with an external entity &#8212; for example, an S3 bucket policy that grants access to another AWS account, or an IAM role that can be assumed externally. Enable it in each region. It runs continuously and alerts you when it finds unintended external access.</p><h2>Phase 6: Plan for the Worst</h2><p>Security isn&#8217;t just about prevention. You also need to prepare for the possibility that something goes wrong.</p><h3>21. Document Your Incident Response Plan</h3><p>Even a simple one-page document is better than nothing. Answer these questions in advance:</p><ul><li><p>Who is responsible for responding to a security alert?</p></li><li><p>How do we revoke compromised credentials?</p></li><li><p>Where are our logs stored and how do we access them?</p></li><li><p>Who do we contact at AWS for support?</p></li></ul><p>Under pressure during an incident is the worst time to figure out these answers.</p><h3>22. Know How to Contact AWS Support</h3><p>If your account is compromised, you may need to contact AWS quickly. Make sure you know how to open a support case, and consider whether your current support plan (Basic, Developer, Business, or Enterprise) provides the response time you&#8217;d need in an emergency. Business-level support or above gives you 24/7 access to Cloud Support Engineers.</p><h3>23. Create an Emergency &#8220;Break Glass&#8221; Procedure for Root Access</h3><p>Since you&#8217;ve locked down root access (as described in Phase 1), document the procedure for accessing it in an emergency. Who has access to the password vault? Where is the MFA device stored? What approvals are needed? This should be written down, tested, and reviewed periodically.</p><h2>Final Thoughts</h2><p>None of these steps are difficult. Most take less than five minutes. But together, they form a security foundation that will protect you from the vast majority of common AWS threats.</p><p>The most expensive security incident is the one you could have prevented with ten minutes of configuration on day one. Don&#8217;t learn that lesson the hard way.</p><p>Bookmark this checklist. Share it with your team. Run through it every time a new account is created. Your future self will thank you.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[SQL or NoSQL on AWS? A Practical Decision Framework for Cloud Engineers]]></title><description><![CDATA[Choosing the right database is one of the most impactful architectural decisions you&#8217;ll make on AWS.]]></description><link>https://blog.thecloudengineers.com/p/sql-or-nosql-on-aws-a-practical-decision</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/sql-or-nosql-on-aws-a-practical-decision</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 18 Feb 2026 10:30:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/426be2dc-4473-4a56-a06e-50eb87344778_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Choosing the right database is one of the most impactful architectural decisions you&#8217;ll make on AWS. Pick the wrong one, and you&#8217;ll face performance bottlenecks, skyrocketing costs, and painful migrations down the road.</p><p>In this article, we&#8217;ll break down the differences between SQL and NoSQL databases on AWS, explore the main services available, and give you a practical decision framework so you can choose with confidence.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Understanding the Fundamentals</h2><p>Before diving into AWS services, let&#8217;s make sure we&#8217;re on the same page about what sets these two paradigms apart.</p><h3>SQL (Relational Databases)</h3><p>Relational databases store data in <strong>tables</strong> with predefined <strong>schemas</strong> made up of rows and columns. They use <strong>Structured Query Language (SQL)</strong> to query and manipulate data and enforce relationships between tables through foreign keys and joins.</p><p><strong>Key characteristics:</strong></p><ul><li><p><strong>Structured schema:</strong> You define your tables, columns, and data types upfront.</p></li><li><p><strong>ACID compliance:</strong> Transactions are Atomic, Consistent, Isolated, and Durable.</p></li><li><p><strong>Strong consistency:</strong> Reads always return the most recent write.</p></li><li><p><strong>Vertical scaling:</strong> Traditionally scaled by adding more power (CPU, RAM) to a single server.</p></li><li><p><strong>Complex queries:</strong> Excellent support for JOINs, aggregations, and ad-hoc queries.</p></li></ul><h3>NoSQL (Non-Relational Databases)</h3><p>NoSQL databases break away from the rigid tabular model. They come in several flavors, document, key-value, wide-column, and graph, each optimized for different access patterns.</p><p><strong>Key characteristics:</strong></p><ul><li><p><strong>Flexible schema:</strong> Fields can vary from record to record.</p></li><li><p><strong>BASE model:</strong> Basically Available, Soft state, Eventually consistent.</p></li><li><p><strong>Horizontal scaling:</strong> Designed to scale out across many nodes.</p></li><li><p><strong>High throughput:</strong> Optimized for massive read/write workloads.</p></li><li><p><strong>Simple queries:</strong> Best suited for known access patterns rather than ad-hoc queries.</p></li></ul><h2>SQL Databases on AWS</h2><h3>Amazon RDS (Relational Database Service)</h3><p>Amazon RDS is a <strong>managed service</strong> that supports six popular relational engines:</p><ul><li><p>MySQL</p></li><li><p>PostgreSQL</p></li><li><p>MariaDB</p></li><li><p>Oracle</p></li><li><p>SQL Server</p></li><li><p>Amazon Aurora (MySQL &amp; PostgreSQL compatible)</p></li></ul><p><strong>When to use RDS:</strong></p><ul><li><p>Your data is highly relational with complex relationships.</p></li><li><p>You need ACID-compliant transactions (e.g., financial systems).</p></li><li><p>Your team is experienced with SQL.</p></li><li><p>You want a managed service but with a traditional relational model.</p></li></ul><p><strong>Example use cases:</strong> E-commerce order management, HR systems, CMS platforms, ERP systems.</p><h3>Amazon Aurora</h3><p>Aurora deserves special attention. It&#8217;s <strong>AWS&#8217;s cloud-native relational database</strong>, compatible with MySQL and PostgreSQL but re-engineered for the cloud.</p><p><strong>Why Aurora stands out:</strong></p><ul><li><p><strong>5x throughput</strong> of standard MySQL, <strong>3x</strong> of standard PostgreSQL.</p></li><li><p>Storage <strong>auto-scales</strong> up to 128 TB.</p></li><li><p><strong>6-way replication</strong> across 3 Availability Zones.</p></li><li><p>Supports up to <strong>15 read replicas</strong> with minimal lag.</p></li><li><p><strong>Aurora Serverless v2</strong> scales capacity automatically based on demand.</p></li></ul><p><strong>When to use Aurora over standard RDS:</strong></p><ul><li><p>You need higher performance and availability.</p></li><li><p>Your workload has unpredictable traffic (Aurora Serverless).</p></li><li><p>You want automatic storage scaling.</p></li></ul><h2>NoSQL Databases on AWS</h2><h3>Amazon DynamoDB</h3><p>DynamoDB is AWS&#8217;s flagship <strong>fully managed key-value and document database</strong>. It&#8217;s designed for applications that need consistent, single-digit millisecond latency at any scale.</p><p><strong>Key features:</strong></p><ul><li><p><strong>Serverless:</strong> No servers to manage, patch, or provision.</p></li><li><p><strong>Auto-scaling:</strong> Throughput adjusts to traffic automatically (on-demand mode).</p></li><li><p><strong>Global Tables:</strong> Multi-region, multi-active replication.</p></li><li><p><strong>DAX (DynamoDB Accelerator):</strong> In-memory caching for microsecond reads.</p></li><li><p><strong>Streams:</strong> Capture item-level changes for event-driven architectures.</p></li></ul><p><strong>When to use DynamoDB:</strong></p><ul><li><p>You need extreme scale and low latency.</p></li><li><p>Your access patterns are well-defined and predictable.</p></li><li><p>You&#8217;re building serverless applications (Lambda + API Gateway + DynamoDB).</p></li><li><p>You need a simple key-value or document store.</p></li></ul><p><strong>Example use cases:</strong> Gaming leaderboards, IoT telemetry, session management, shopping carts, real-time bidding.</p><h3>Amazon DocumentDB (MongoDB Compatible)</h3><p>If your team already uses <strong>MongoDB</strong>, DocumentDB provides a managed, MongoDB-compatible document database.</p><p><strong>When to use DocumentDB:</strong></p><ul><li><p>You&#8217;re migrating from MongoDB.</p></li><li><p>You need flexible JSON document storage with rich query capabilities.</p></li><li><p>You want managed infrastructure without running your own MongoDB clusters.</p></li></ul><h3>Amazon ElastiCache (Redis &amp; Memcached)</h3><p>ElastiCache provides <strong>in-memory key-value stores</strong> for caching and real-time workloads.</p><p><strong>When to use ElastiCache:</strong></p><ul><li><p>You need <strong>sub-millisecond</strong> response times.</p></li><li><p>You&#8217;re caching frequently accessed data from another database.</p></li><li><p>You need real-time features like session stores, pub/sub messaging, or leaderboards.</p></li></ul><h3>Amazon Neptune</h3><p>Neptune is a <strong>graph database</strong> for highly connected datasets.</p><p><strong>When to use Neptune:</strong></p><ul><li><p>Social networks, recommendation engines, fraud detection.</p></li><li><p>Your queries involve traversing complex relationships between entities.</p></li></ul><h2><strong>Head-to-Head Comparison</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MPXz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MPXz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 424w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 848w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1272w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MPXz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png" width="1456" height="786" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:226962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/188352806?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MPXz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 424w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 848w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1272w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Decision Framework: 7 Questions to Ask Yourself</h2><p>Use these questions to guide your choice:</p><h3>1. How relational is your data?</h3><p>Highly relational (many joins, foreign keys, normalized tables) &#8594; SQL<br>Loosely structured or denormalized &#8594; NoSQL</p><h3>2. How complex are your queries?</h3><p>Need ad-hoc queries, complex reporting, aggregations &#8594; SQL<br>Known, simple access patterns (get by key, query by index) &#8594; NoSQL</p><h3>3. What&#8217;s your scale?</h3><p>Moderate, predictable traffic &#8594; SQL (RDS/Aurora) handles this well<br>Massive, spiky, or unpredictable traffic &#8594; NoSQL (DynamoDB) scales seamlessly</p><h3>4. What are your latency requirements?</h3><p>Single-digit millisecond &#8594; Both work<br>Microsecond &#8594; DynamoDB + DAX or ElastiCache</p><h3>5. Do you need ACID transactions?</h3><p>Complex multi-table transactions &#8594; SQL<br>Simple transactions &#8594; DynamoDB Transactions may be sufficient</p><h3>6. Are you building serverless?</h3><p>DynamoDB integrates natively with Lambda, API Gateway, and EventBridge.<br>RDS in serverless architectures can be problematic (connection limits), though Aurora Serverless v2 and RDS Proxy help.</p><h3>7. What&#8217;s your budget model preference?</h3><p>Predictable monthly cost &#8594; RDS (reserved instances)<br>Pay-per-use &#8594; DynamoDB on-demand mode</p><h2>Real-World Scenarios</h2><h3>Scenario 1: E-Commerce Platform</h3><p><strong>Requirements:</strong> Product catalog, orders, inventory, payments.</p><p><strong>Recommendation: Aurora PostgreSQL + DynamoDB (hybrid)</strong></p><ul><li><p>Use <strong>Aurora</strong> for orders and payments (ACID transactions, complex queries).</p></li><li><p>Use <strong>DynamoDB</strong> for the product catalog and shopping cart (high read throughput, flexible attributes).</p></li><li><p>Use <strong>ElastiCache Redis</strong> for session management and caching.</p></li></ul><h3>Scenario 2: Real-Time Gaming Backend</h3><p><strong>Requirements:</strong> Leaderboards, player profiles, matchmaking, millions of concurrent users.</p><p><strong>Recommendation: DynamoDB</strong></p><ul><li><p>Scales horizontally to handle millions of players.</p></li><li><p>Single-digit millisecond latency for leaderboard reads/writes.</p></li><li><p>Global Tables for multi-region play.</p></li></ul><h3>Scenario 3: Financial Reporting System</h3><p><strong>Requirements:</strong> Complex queries, regulatory compliance, audit trails, historical analysis.</p><p><strong>Recommendation: Aurora PostgreSQL</strong></p><ul><li><p>Full ACID compliance for transaction integrity.</p></li><li><p>Complex SQL queries for financial reporting.</p></li><li><p>Point-in-time recovery for audit requirements.</p></li></ul><h3>Scenario 4: IoT Data Platform</h3><p><strong>Requirements:</strong> Millions of devices sending telemetry every second.</p><p><strong>Recommendation: DynamoDB + Amazon Timestream</strong></p><ul><li><p><strong>DynamoDB</strong> for device state and metadata.</p></li><li><p><strong>Timestream</strong> (purpose-built time-series DB) for telemetry data.</p></li></ul><h2>The Hybrid Approach: Why Not Both?</h2><p>In practice, many production systems on AWS use <strong>both SQL and NoSQL</strong> databases. This is called <strong>polyglot persistence</strong>, using the best database for each specific job.</p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;   Frontend   &#9474;&#9472;&#9472;&#9472;&#9472;&#9654;&#9474;   API Gateway +  &#9474;&#9472;&#9472;&#9472;&#9472;&#9654;&#9474;  DynamoDB    &#9474;
&#9474;   (React)    &#9474;     &#9474;   Lambda         &#9474;     &#9474;  (Catalog,   &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;     &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;     &#9474;   Sessions)  &#9474;
                            &#9474;                 &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                            &#9474;
                            &#9660;
                     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
                     &#9474;   ECS / EKS      &#9474;&#9472;&#9472;&#9472;&#9472;&#9654;&#9474;  Aurora       &#9474;
                     &#9474;   (Order Service)&#9474;     &#9474;  (Orders,    &#9474;
                     &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;     &#9474;   Payments)  &#9474;
                                              &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Tips for a hybrid approach:</strong></p><ul><li><p>Use <strong>DynamoDB</strong> for high-throughput, simple-access workloads.</p></li><li><p>Use <strong>Aurora</strong> for transactional and reporting workloads.</p></li><li><p>Use <strong>ElastiCache</strong> as a caching layer in front of either.</p></li><li><p>Use <strong>Amazon EventBridge</strong> or <strong>DynamoDB Streams</strong> to sync data between databases when needed.</p></li></ul><h2>Final Thoughts</h2><p>There&#8217;s no universal &#8220;best&#8221; database, only the <strong>best database for your specific use case</strong>. The most important thing is to <strong>start with your access patterns and requirements</strong>, not with the technology.</p><p>Ask yourself:</p><ul><li><p><em>What does my data look like?</em></p></li><li><p><em>How will I query it?</em></p></li><li><p><em>How much will it scale?</em></p></li><li><p><em>What consistency guarantees do I need?</em></p></li></ul><p>Let the answers guide you to the right choice. And remember, on AWS, you&#8217;re never locked into one paradigm. Embrace polyglot persistence and use the right tool for each job.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[AWS Networking 101: Everything You Need to Know About VPCs]]></title><description><![CDATA[If you&#8217;re starting your AWS journey, you&#8217;ve probably heard about VPC (Virtual Private Cloud) and wondered what all the fuss is about.]]></description><link>https://blog.thecloudengineers.com/p/aws-networking-101-everything-you</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/aws-networking-101-everything-you</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 11 Feb 2026 10:30:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/850dc490-8acc-4099-8af5-ee55c615f423_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;re starting your AWS journey, you&#8217;ve probably heard about VPC (Virtual Private Cloud) and wondered what all the fuss is about. Maybe you&#8217;ve seen network diagrams that look like spaghetti, or you&#8217;ve tried to follow documentation that assumes you already know what a subnet is. Perhaps you&#8217;ve even accidentally launched resources into the default VPC without fully understanding what was happening behind the scenes, and that&#8217;s completely normal.</p><p>Here&#8217;s the truth: VPCs are simpler than they seem. Once you understand the core concepts and how they fit together, everything else clicks into place. The terminology can feel overwhelming at first, CIDR blocks, route tables, gateways, but each of these is just a small, logical piece of a bigger puzzle. This guide will walk you through VPCs step-by-step, with visual explanations that make sense. By the end, you won&#8217;t just understand VPCs, you&#8217;ll understand <em>why</em> they&#8217;re designed the way they are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>What Is a VPC, Really?</strong></h2><p>Think of a VPC as your own private section of the AWS cloud. It&#8217;s like having your own data center, but virtual. Inside this space, you can launch AWS resources (like EC2 instances, databases, and Lambda functions) and control exactly how they communicate with each other and the outside world.</p><p>To make this even more concrete, imagine you&#8217;re renting an entire floor in a massive office building. The building is AWS&#8217;s cloud infrastructure, shared among millions of tenants. But your floor? That&#8217;s entirely yours. You decide how to partition the rooms, who gets access to what, which doors lead outside, and which rooms are locked away from public view. Other tenants can&#8217;t wander into your space, and you can&#8217;t wander into theirs, unless you both explicitly agree to connect your floors together.</p><p>Every AWS account comes with a <strong>default VPC</strong> in each region, pre-configured so you can start launching resources immediately. However, for production workloads, you&#8217;ll almost always want to create a <strong>custom VPC</strong> where you have full control over the network design. The default VPC is convenient for experimentation, but it makes assumptions about your architecture that may not align with your security or organizational needs.</p><p><strong>The key benefit:</strong> Complete control over your network environment, IP addresses, subnets, route tables, and network gateways. You decide what&#8217;s exposed, what&#8217;s hidden, and how every piece communicates.</p><h2><strong>The Building Blocks</strong></h2><p>Let&#8217;s break down the essential components you need to understand:</p><h3><strong>1. VPC (Virtual Private Cloud)</strong></h3><p><strong>What it is:</strong> The container for your entire network. It&#8217;s the foundational layer upon which everything else is built.</p><p><strong>Key details:</strong></p><ul><li><p>You define an IP address range using <strong>CIDR notation</strong> (e.g., <code>10.0.0.0/16</code>). CIDR stands for Classless Inter-Domain Routing, and it&#8217;s simply a compact way of expressing a range of IP addresses. The <code>/16</code> part is called the prefix length and determines how many IP addresses are available within the range. A smaller number after the slash means a larger network, <code>/16</code> gives you 65,536 addresses, while <code>/24</code> gives you only 256.</p></li><li><p>Choosing your CIDR block is one of the most important early decisions because <strong>you cannot change it after creation</strong> (though you can add secondary CIDR blocks later). If you choose a range that&#8217;s too small, you&#8217;ll run out of IP addresses as your infrastructure grows. If you choose one that overlaps with another VPC or your on-premises network, you&#8217;ll run into conflicts when you try to connect them later.</p></li><li><p>Each VPC is <strong>isolated from other VPCs by default</strong>. This is a fundamental security property. Even if two VPCs use the exact same CIDR block, they cannot communicate unless you explicitly set up a connection between them (using VPC Peering or a Transit Gateway). This isolation means a misconfiguration in one VPC won&#8217;t accidentally leak traffic into another.</p></li><li><p>A VPC exists within a single <strong>AWS Region</strong> but spans all of that region&#8217;s Availability Zones. This is important: your VPC isn&#8217;t confined to a single data center. It&#8217;s a regional construct that gives you the ability to distribute resources across multiple physically separated facilities.</p></li></ul><p><strong>Visual concept:</strong></p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;         VPC (10.0.0.0/16)         &#9474;
&#9474;                                   &#9474;
&#9474;  [Your AWS resources live here]   &#9474;
&#9474;                                   &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><h3><strong>2. Subnets</strong></h3><p><strong>What they are:</strong> Subdivisions of your VPC&#8217;s IP address range. If the VPC is your floor in the office building, subnets are the individual rooms you create within it.</p><p><strong>Why you need them:</strong> To organize resources and control access to the internet. Subnets give you the ability to apply different network rules and routing behaviors to different groups of resources. Without subnets, every resource in your VPC would have to follow the same networking rules, and that&#8217;s not practical when some resources need public exposure and others need to stay hidden.</p><p><strong>Two types:</strong></p><ul><li><p><strong>Public subnets:</strong> Resources here can communicate directly with the internet. These are for things that need to be reachable by users or external systems, web servers, load balancers, bastion hosts, and similar front-facing infrastructure.</p></li><li><p><strong>Private subnets:</strong> Resources here cannot directly access the internet and, crucially, cannot be directly reached from the internet. This makes them far more secure. These are ideal for databases, application servers, internal microservices, caches, and anything that doesn&#8217;t need a public-facing presence.</p></li></ul><p>It&#8217;s important to understand that <strong>there&#8217;s nothing inherently &#8220;public&#8221; or &#8220;private&#8221; about a subnet itself</strong>. What makes a subnet public or private is its <strong>route table configuration</strong>, specifically, whether its route table includes a route to an Internet Gateway. This is a common source of confusion for beginners who think the distinction is set at creation time.</p><p>Each subnet lives within a <strong>single Availability Zone</strong> and cannot span multiple AZs. This is by design, it allows AWS to provide strong isolation guarantees. If one AZ experiences issues, subnets in other AZs remain unaffected. This is why best practice dictates creating matching subnets across multiple Availability Zones, so your application can survive the loss of an entire data center.</p><p>Each subnet also has a small number of <strong>reserved IP addresses</strong> that AWS uses for internal purposes. In every subnet, the first four IP addresses and the last IP address are reserved. For a <code>/24</code> subnet (256 addresses), that means 251 are actually usable. This is a small detail, but it matters when you&#8217;re planning tightly-sized subnets.</p><p><strong>Visual concept:</strong></p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;              VPC (10.0.0.0/16)               &#9474;
&#9474;                                              &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9474;
&#9474;  &#9474;  Public Subnet   &#9474;  &#9474;  Private Subnet  &#9474;  &#9474;
&#9474;  &#9474;  10.0.1.0/24     &#9474;  &#9474;  10.0.2.0/24     &#9474;  &#9474;
&#9474;  &#9474;                  &#9474;  &#9474;                  &#9474;  &#9474;
&#9474;  &#9474;  [Web Servers]   &#9474;  &#9474;  [Databases]     &#9474;  &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9474;
&#9474;                                              &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Best practice:</strong> Create subnets in multiple Availability Zones for high availability. A common pattern is to have at least two public subnets and two private subnets, each pair spread across two different AZs. Some organizations go further and use three AZs for even greater resilience.</p><h3><strong>3. Internet Gateway (IGW)</strong></h3><p><strong>What it is:</strong> The door between your VPC and the public internet. It&#8217;s a horizontally scaled, redundant, and highly available component managed entirely by AWS.</p><p><strong>What it does:</strong> Allows resources in public subnets to communicate with the internet, both sending requests out and receiving requests in. Without an Internet Gateway, your VPC is a completely sealed-off network with no path to the outside world.</p><p>The Internet Gateway serves two critical functions. First, it acts as a <strong>target in your route table</strong> for internet-bound traffic, giving your resources a path to follow when they need to reach an external address. Second, it performs <strong>Network Address Translation (NAT)</strong> for instances that have been assigned a public IPv4 address. When traffic leaves your instance, the IGW replaces the private IP with the public IP; when responses come back, it reverses the translation. This is why simply attaching an IGW isn&#8217;t enough, your instances also need public IP addresses (or Elastic IPs) and the correct route table entries to actually use it.</p><p><strong>Visual concept:</strong></p><pre><code><code>                    Internet
                       &#8597;
              &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
              &#9474; Internet       &#9474;
              &#9474; Gateway (IGW)  &#9474;
              &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                       &#8597;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;           VPC (10.0.0.0/16)        &#9474;
&#9474;                                    &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                &#9474;
&#9474;  &#9474; Public Subnet  &#9474;                &#9474;
&#9474;  &#9474; 10.0.1.0/24    &#9474;                &#9474;
&#9474;  &#9474;                &#9474;                &#9474;
&#9474;  &#9474; [Web Server]   &#9474;                &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Key point:</strong> You attach <strong>one IGW per VPC</strong>. You cannot attach multiple Internet Gateways to the same VPC, and you cannot share an IGW across VPCs. It&#8217;s a one-to-one relationship. Without it, nothing in your VPC can reach the internet, and nothing on the internet can reach your VPC. It&#8217;s also worth noting that the IGW itself doesn&#8217;t impose bandwidth limits or become a bottleneck, AWS manages its scaling transparently.</p><h3><strong>4. Route Tables</strong></h3><p><strong>What they are:</strong> Sets of rules (called routes) that determine where network traffic is directed. Every single packet of data that moves through your VPC is guided by a route table.</p><p><strong>How they work:</strong> Each subnet is associated with exactly one route table, and that route table tells traffic where to go based on its destination. When a resource in your subnet sends a packet, the VPC examines the destination IP address, looks it up in the route table, and forwards the packet to the matching target. If there are multiple matching routes, the most specific route (the one with the longest prefix) wins.</p><p>Every VPC comes with a <strong>main route table</strong> automatically. Any subnet that isn&#8217;t explicitly associated with a custom route table will use the main route table by default. This is a subtle but important point, it means newly created subnets silently inherit the main route table&#8217;s behavior. For this reason, many practitioners recommend keeping the main route table restrictive (private) and explicitly assigning custom route tables with internet access only to the subnets that need it. This way, if someone creates a new subnet and forgets to assign a route table, it defaults to private rather than accidentally becoming public.</p><p><strong>Example routes:</strong></p><ul><li><p><code>10.0.0.0/16 &#8594; local</code> : Traffic destined for any IP within the VPC stays inside the VPC. This route is automatically present in every route table and <strong>cannot be removed</strong>. It&#8217;s what allows your resources to communicate with each other across subnets.</p></li><li><p><code>0.0.0.0/0 &#8594; igw-xxxxx</code> : All other traffic (anything not matching a more specific route) goes to the Internet Gateway. This is the route that makes a subnet &#8220;public.&#8221; The <code>0.0.0.0/0</code> destination is a catch-all, often called the <strong>default route</strong>.</p></li></ul><p><strong>Visual concept:</strong></p><pre><code><code>Public Subnet Route Table:
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474; Destination     &#9474; Target       &#9474;
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474; 10.0.0.0/16     &#9474; local        &#9474;
&#9474; 0.0.0.0/0       &#9474; igw-xxxxx    &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;

Private Subnet Route Table:
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474; Destination     &#9474; Target       &#9474;
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474; 10.0.0.0/16     &#9474; local        &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>The difference:</strong> Public subnets have a route to the IGW; private subnets don&#8217;t. That single route entry is literally the only thing that separates a &#8220;public&#8221; subnet from a &#8220;private&#8221; one. Route tables can also contain routes pointing to other targets like NAT Gateways, VPC Peering connections, Transit Gateways, VPN connections, and VPC endpoints, which is how more complex architectures are built.</p><h3><strong>5. NAT Gateway</strong></h3><p><strong>What it is:</strong> A managed service that allows resources in private subnets to initiate outbound connections to the internet (for software updates, API calls, downloading patches, etc.) <strong>without</strong> exposing them to inbound internet traffic. It performs Network Address Translation, replacing the private IP address of the originating resource with its own public IP address before forwarding the request to the internet.</p><p><strong>Why you need it:</strong> Your database in a private subnet needs to download security patches, your application server needs to call a third-party API, or your Lambda function in a private subnet needs to reach an external webhook. But you don&#8217;t want any of these resources to be directly reachable from the internet. The NAT Gateway solves this by acting as a one-way intermediary, it allows traffic <strong>out</strong> but blocks unsolicited traffic <strong>in</strong>.</p><p><strong>Visual concept:</strong></p><pre><code><code>                    Internet
                       &#8597;
              &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
              &#9474; Internet       &#9474;
              &#9474; Gateway (IGW)  &#9474;
              &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                       &#8597;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;              VPC (10.0.0.0/16)               &#9474;
&#9474;                                              &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Public Subnet  &#9474;    &#9474; Private Subnet  &#9474;   &#9474;
&#9474;  &#9474; 10.0.1.0/24    &#9474;    &#9474; 10.0.2.0/24     &#9474;   &#9474;
&#9474;  &#9474;                &#9474;    &#9474;                 &#9474;   &#9474;
&#9474;  &#9474; [NAT Gateway]  &#9474;&#8592;&#9472;&#9472;&#9472;&#9474; [Database]      &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;                                              &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>How it works:</strong></p><ol><li><p>The database initiates an outbound connection (e.g., to download a security patch).</p></li><li><p>The private subnet&#8217;s route table sends internet-bound traffic (<code>0.0.0.0/0</code>) to the NAT Gateway in the public subnet.</p></li><li><p>The NAT Gateway replaces the database&#8217;s private IP address with its own public Elastic IP address and forwards the traffic to the Internet Gateway.</p></li><li><p>The IGW sends the request out to the internet.</p></li><li><p>The response comes back through the same path in reverse, IGW to NAT Gateway, which translates the address back and forwards the response to the database.</p></li></ol><p>The key security property here is that while the database can <strong>reach out</strong> to the internet, no one on the internet can <strong>initiate</strong> a connection to the database. The NAT Gateway will simply drop any unsolicited inbound traffic.</p><p><strong>Important architectural consideration:</strong> A NAT Gateway is deployed into a <strong>specific subnet</strong> within a <strong>specific Availability Zone</strong>. If that AZ goes down, resources in private subnets in other AZs that depend on that NAT Gateway will lose their internet access. This is why production architectures deploy a NAT Gateway in each AZ and configure each private subnet to route through the NAT Gateway in its own AZ. This adds cost but ensures that an AZ failure doesn&#8217;t cascade into a networking failure for your private resources.</p><p><strong>Cost note:</strong> NAT Gateways cost approximately $0.045/hour (about $32/month) plus data processing charges of $0.045 per GB. In a multi-AZ setup with two NAT Gateways, that&#8217;s roughly $64/month before any data transfer. For learning environments or development workloads, you can use a <strong>NAT Instance</strong> (a regular EC2 instance configured to perform NAT) to save money, though it won&#8217;t offer the same availability, bandwidth, or managed maintenance that a NAT Gateway provides. Some teams also explore <strong>VPC endpoints</strong> as a way to avoid NAT Gateway costs entirely for traffic destined to supported AWS services like S3 or DynamoDB.</p><h3><strong>6. Security Groups</strong></h3><p><strong>What they are:</strong> Virtual firewalls that control inbound and outbound traffic at the <strong>instance level</strong> (more precisely, at the network interface level). Every EC2 instance, RDS database, Lambda function in a VPC, and many other AWS resources are associated with one or more Security Groups.</p><p><strong>How they work:</strong> You define rules that <strong>allow</strong> specific traffic. This is an important distinction: Security Groups are <strong>allow-only</strong>. You cannot write a rule that explicitly denies traffic. If traffic doesn&#8217;t match any allow rule, it is implicitly denied. This makes Security Groups relatively straightforward to reason about, if it&#8217;s not explicitly permitted, it&#8217;s blocked.</p><p><strong>Example rules:</strong></p><pre><code><code>Web Server Security Group:
Inbound:
- Allow HTTP (port 80) from 0.0.0.0/0 (anywhere)
- Allow HTTPS (port 443) from 0.0.0.0/0
- Allow SSH (port 22) from your IP only

Outbound:
- Allow all traffic (default)</code></code></pre><pre><code><code>Database Security Group:
Inbound:
- Allow MySQL (port 3306) from Web Server Security Group only

Outbound:
- Allow all traffic</code></code></pre><p>One of the most powerful features of Security Groups is the ability to <strong>reference other Security Groups</strong> as sources or destinations in your rules. In the example above, the database Security Group doesn&#8217;t allow MySQL traffic from a specific IP address, it allows it from any resource associated with the Web Server Security Group. This means if you add a new web server and attach the same Security Group, it automatically gains database access without any rule changes. This creates a dynamic, self-maintaining security model that scales with your infrastructure.</p><p><strong>Key concept:</strong> Security Groups are <strong>stateful</strong>, if you allow inbound traffic on a port, the corresponding response traffic is automatically allowed outbound, regardless of the outbound rules. Likewise, if an instance initiates an outbound connection, the response is automatically allowed inbound. This means you don&#8217;t need to worry about creating matching rules for both directions of a conversation, which significantly reduces the complexity of your firewall configuration.</p><p>Another important behavior: Security Groups <strong>evaluate all rules before making a decision</strong>. Unlike NACLs (which process rules in order and stop at the first match), Security Groups aggregate all rules and allow traffic if <strong>any</strong> rule permits it. This means you never have to worry about rule ordering within a Security Group.</p><h3><strong>7. Network ACLs (NACLs)</strong></h3><p><strong>What they are:</strong> An additional layer of security that operates at the <strong>subnet level</strong>. While Security Groups wrap around individual resources, NACLs wrap around entire subnets, filtering traffic as it enters and exits the subnet boundary.</p><p><strong>Difference from Security Groups:</strong></p><ul><li><p><strong>NACLs are stateless</strong>, you must explicitly write rules for both inbound <strong>and</strong> outbound traffic. If you allow inbound HTTP traffic on port 80, you must also create an outbound rule allowing the response traffic (which typically uses an ephemeral port in the range 1024&#8211;65535). Forgetting the outbound rule is one of the most common NACL mistakes, and it results in traffic appearing to be silently dropped.</p></li><li><p><strong>NACLs support both allow and deny rules</strong>, unlike Security Groups which only support allow rules. This makes NACLs useful for explicitly blocking specific IP addresses or ranges, for example, blocking a known malicious IP that&#8217;s attempting to probe your infrastructure.</p></li><li><p><strong>NACLs process rules in order</strong>, starting from the lowest numbered rule and stopping at the first match. This means rule ordering matters significantly. A common pattern is to use low-numbered rules (like 100, 200, 300) for your specific allow/deny rules, leaving gaps so you can insert new rules later without renumbering everything.</p></li><li><p><strong>NACLs apply to all resources within a subnet</strong> automatically. You don&#8217;t attach a NACL to an individual instance, every resource in the subnet is subject to the NACL&#8217;s rules, with no exceptions.</p></li><li><p><strong>Each subnet must be associated with exactly one NACL.</strong> If you don&#8217;t explicitly assign one, the subnet uses the VPC&#8217;s <strong>default NACL</strong>, which allows all inbound and outbound traffic. Custom NACLs, by contrast, <strong>deny all traffic by default</strong> until you add rules, another common gotcha for beginners.</p></li></ul><p>Think of NACLs and Security Groups as two concentric walls of defense. Traffic entering a subnet first passes through the NACL. If the NACL allows it, the traffic then reaches the Security Group attached to the target resource. Both must permit the traffic for it to get through. This is an example of <strong>defense in depth</strong>, even if one layer is misconfigured, the other layer can still provide protection.</p><p><strong>When to use them:</strong> Most beginners can stick with Security Groups for day-to-day access control. NACLs are most useful when you need subnet-wide rules, when you need to explicitly <strong>deny</strong> traffic from specific sources, or when your organization&#8217;s compliance requirements mandate multiple layers of network filtering. In many production environments, NACLs are kept relatively permissive while Security Groups do the heavy lifting of fine-grained access control.</p><h2><strong>Putting It All Together: A Complete VPC Architecture</strong></h2><p>Here&#8217;s a production-ready VPC setup:</p><pre><code><code>                         Internet
                            &#8597;
                   &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
                   &#9474; Internet       &#9474;
                   &#9474; Gateway (IGW)  &#9474;
                   &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                            &#8597;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                  VPC (10.0.0.0/16)                  &#9474;
&#9474;                                                     &#9474;
&#9474;  Availability Zone A          Availability Zone B   &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;        &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Public Subnet    &#9474;        &#9474; Public Subnet    &#9474;   &#9474;
&#9474;  &#9474; 10.0.1.0/24      &#9474;        &#9474; 10.0.3.0/24      &#9474;   &#9474;
&#9474;  &#9474;                  &#9474;        &#9474;                  &#9474;   &#9474;
&#9474;  &#9474; [Web Server A]   &#9474;        &#9474; [Web Server B]   &#9474;   &#9474;
&#9474;  &#9474; [NAT Gateway A]  &#9474;        &#9474; [NAT Gateway B]  &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;        &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;          &#8597;                           &#8597;              &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;        &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Private Subnet   &#9474;        &#9474; Private Subnet   &#9474;   &#9474;
&#9474;  &#9474; 10.0.2.0/24      &#9474;        &#9474; 10.0.4.0/24      &#9474;   &#9474;
&#9474;  &#9474;                  &#9474;        &#9474;                  &#9474;   &#9474;
&#9474;  &#9474; [Database A]     &#9474;        &#9474; [Database B]     &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;        &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;                                                     &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Why this architecture works:</strong></p><ul><li><p><strong>High availability:</strong> Resources are distributed across multiple Availability Zones. If AZ-A experiences a hardware failure, a power outage, or a network issue, AZ-B continues to serve traffic. Your application stays online because it doesn&#8217;t depend on any single physical location.</p></li><li><p><strong>Security:</strong> Databases sit in private subnets with no direct internet access. The only way to reach them is through the web servers (controlled by Security Groups) or via internal networking. This dramatically reduces the attack surface for your most sensitive data.</p></li><li><p><strong>Internet access:</strong> NAT Gateways in each AZ ensure that private resources can still reach the internet for necessary outbound tasks, operating system patches, third-party API calls, package downloads, without compromising their isolation from inbound internet traffic. Each private subnet routes through the NAT Gateway in its own AZ, so an AZ failure doesn&#8217;t cut off internet access for the other AZ&#8217;s private resources.</p></li><li><p><strong>Scalability:</strong> An Application Load Balancer (which would sit in the public subnets) distributes incoming user traffic across web servers in both AZs. As demand grows, you can add more instances behind the load balancer, or use Auto Scaling Groups to add and remove them automatically based on metrics like CPU utilization or request count.</p></li></ul><p>This architecture is often extended with additional subnet tiers. Many organizations add a <strong>third tier</strong>, sometimes called an &#8220;isolated&#8221; or &#8220;data&#8221; tier, for resources that should have no internet access at all, not even outbound. You might also see dedicated subnets for things like Elasticache clusters, internal load balancers, or container orchestration infrastructure. The pattern stays the same; you&#8217;re just adding more rooms to the floor plan.</p><h2><strong>Conclusion</strong></h2><p>VPCs don&#8217;t have to be intimidating. At their core, they&#8217;re built from a handful of straightforward concepts: a <strong>VPC</strong> gives you an isolated network, <strong>subnets</strong> help you organize and secure your resources, an <strong>Internet Gateway</strong> connects you to the outside world, <strong>route tables</strong> direct traffic, <strong>NAT Gateways</strong> let private resources reach the internet safely, and <strong>Security Groups</strong> and <strong>NACLs</strong> act as your gatekeepers.</p><p>Once you understand how these pieces snap together, you&#8217;re not just following tutorials, you&#8217;re making informed decisions about how your infrastructure should work. You&#8217;ll understand <em>why</em> a database should live in a private subnet, <em>why</em> a route table needs a specific entry, and <em>why</em> Security Groups reference each other instead of hard-coded IP addresses.</p><p><strong>Here&#8217;s what I&#8217;d recommend as your next steps:</strong></p><ul><li><p><strong>Get hands-on.</strong> Create a VPC from scratch in the AWS Console (not the default one). Set up a public subnet, a private subnet, an Internet Gateway, and a route table. Launch an EC2 instance in the public subnet and verify you can SSH into it. Then launch one in the private subnet and observe that you can&#8217;t. Break things on purpose, remove the route to the IGW and see what happens. Remove the Security Group rule for SSH and watch the connection fail. These small experiments will teach you more than reading ever could.</p></li><li><p><strong>Start simple.</strong> You don&#8217;t need multi-AZ NAT Gateways on day one. Build the basic architecture first, one public subnet, one private subnet, one AZ. Get comfortable with how traffic flows. Then layer in high availability by duplicating the setup across a second AZ. Add a load balancer. Add a NAT Gateway. Each addition will make more sense because you understand the foundation it&#8217;s building on.</p></li><li><p><strong>Explore Infrastructure as Code.</strong> Once you&#8217;re comfortable building VPCs manually, try recreating the same setup with <strong>CloudFormation</strong> or <strong>Terraform</strong>. Defining your VPC in code forces you to understand every dependency and every relationship between components, you can&#8217;t click &#8220;next&#8221; and let the console fill in defaults. It&#8217;ll solidify your understanding and make your infrastructure repeatable, version-controlled, and shareable with your team.</p></li><li><p><strong>Learn about VPC connectivity options.</strong> Once your single-VPC architecture is solid, start exploring how VPCs connect to each other and to the outside world. <strong>VPC Peering</strong> lets two VPCs communicate directly. <strong>Transit Gateway</strong> simplifies networking when you have many VPCs. <strong>VPN connections</strong> and <strong>AWS Direct Connect</strong> link your VPC to on-premises data centers. <strong>VPC Endpoints</strong> let your private resources access AWS services like S3 and DynamoDB without traversing the internet at all. Each of these builds directly on the concepts you&#8217;ve learned here.</p></li></ul><p>The production-ready architecture we walked through above isn&#8217;t just a diagram, it&#8217;s the foundation that most real-world AWS applications are built on. Every load balancer, container cluster, and serverless application you&#8217;ll work with in the future sits inside a VPC. By understanding these fundamentals now, you&#8217;re setting yourself up to build cloud infrastructure that&#8217;s secure, scalable, and resilient from the start.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Building a Containerized Application with ECS and Fargate]]></title><description><![CDATA[Running containers in production can feel overwhelming when you first approach AWS.]]></description><link>https://blog.thecloudengineers.com/p/building-a-containerized-application</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/building-a-containerized-application</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 04 Feb 2026 10:30:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/41e07d5c-9502-4385-8cd6-ae90c9240aa2_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Running containers in production can feel overwhelming when you first approach AWS. You hear terms like ECS, EKS, Fargate, EC2 launch types, task definitions, and services. It gets confusing fast.</p><p>This article walks you through the complete flow of deploying a containerized application using Amazon ECS with Fargate. No servers to manage. No cluster capacity to worry about. Just your application running in containers.</p><p>By the end, you will understand each component and how they connect together.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What Are ECS and Fargate?</h2><h3>Amazon ECS</h3><p>Elastic Container Service (ECS) is a container orchestration platform from AWS. It handles the scheduling, deployment, and management of your containers. Think of it as the brain that decides where and how your containers run.</p><h3>AWS Fargate</h3><p>Fargate is a serverless compute engine for containers. Instead of provisioning EC2 instances and managing their capacity, you simply tell AWS how much CPU and memory your container needs. Fargate handles the underlying infrastructure.</p><p>When you combine ECS with Fargate, you get container orchestration without server management.</p><h2>Architecture Overview</h2><p>Before diving into the steps, let us look at the components involved.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SXZ9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SXZ9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 424w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 848w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1272w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png" width="1456" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136151,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/186718635?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SXZ9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 424w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 848w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1272w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Complete Flow</h2><h3>Step 1: Prepare Your Application for Containerization</h3><p>Start with your application. Whether it is a web API, a backend service, or a worker process, you need to package it into a container image.</p><p>Create a Dockerfile in your project root. This file contains instructions for building your container image. Define the base image, copy your application files, install dependencies, and specify the command to run your application.</p><p>Keep images small. Use multi-stage builds if needed. Smaller images mean faster deployments and lower costs.</p><h3>Step 2: Create an ECR Repository</h3><p>Amazon Elastic Container Registry stores your container images. Before pushing images, you need a repository.</p><p>Navigate to the ECR console and create a new private repository. Choose a meaningful name that reflects your application. Enable image scanning to automatically check for vulnerabilities when images are pushed.</p><p>Configure lifecycle policies to automatically clean up old images. This prevents storage costs from growing indefinitely.</p><h3>Step 3: Build and Push Your Image</h3><p>Build your container image locally using Docker. Tag the image with your ECR repository URI. The tag should include a version identifier such as a git commit hash or semantic version number.</p><p>Authenticate Docker with ECR using the AWS CLI. This generates temporary credentials that allow Docker to push images.</p><p>Push the tagged image to your ECR repository. Verify the push succeeded by checking the repository in the console.</p><h3>Step 4: Set Up the Network Infrastructure</h3><p>Your containers need a network to run in. Create a VPC or use an existing one.</p><p>Design your subnets carefully. Place your Fargate tasks in private subnets. They do not need direct internet access. Place the Application Load Balancer in public subnets so it can receive traffic from the internet.</p><p>Create a NAT Gateway in a public subnet. Your tasks in private subnets need this to pull images from ECR and access other AWS services.</p><p>Configure security groups. Create one for the load balancer that allows inbound traffic on your application port. Create another for the Fargate tasks that only allows traffic from the load balancer security group.</p><h3>Step 5: Create IAM Roles</h3><p>ECS requires two types of IAM roles.</p><p>The task execution role allows ECS to pull images from ECR, send logs to CloudWatch, and retrieve secrets from Secrets Manager or Parameter Store. AWS provides a managed policy called <code>AmazonECSTaskExecutionRolePolicy</code> that covers most use cases.</p><p>The task role is what your application uses at runtime. Attach policies based on what AWS services your application needs to access. If your app reads from S3, add S3 read permissions. If it writes to DynamoDB, add those permissions. Follow the principle of least privilege.</p><h3>Step 6: Create an ECS Cluster</h3><p>The cluster is a logical grouping for your services and tasks. Navigate to the ECS console and create a new cluster.</p><p>Give it a descriptive name. Enable Container Insights if you want detailed metrics and logs. This adds cost but provides valuable observability.</p><p>With Fargate, the cluster is essentially just a namespace. There are no EC2 instances to manage.</p><h3>Step 7: Define Your Task Definition</h3><p>The task definition is a blueprint for your container. It describes how your container should run.</p><p>Specify the following:</p><p><strong>Family name</strong>: A unique identifier for your task definition. New revisions will share this name.</p><p><strong>Launch type</strong>: Select Fargate.</p><p><strong>CPU and memory</strong>: Choose appropriate values based on your application requirements. Fargate has specific combinations available.</p><p><strong>Container definition</strong>: Provide the ECR image URI, container port mappings, environment variables, and log configuration.</p><p><strong>Networking mode</strong>: Use awsvpc mode. This is required for Fargate and gives each task its own elastic network interface.</p><p><strong>Log configuration</strong>: Send logs to CloudWatch Logs. Create a log group beforehand or let ECS create one automatically.</p><h3>Step 8: Create an Application Load Balancer</h3><p>The load balancer distributes traffic across your tasks and provides a stable endpoint.</p><p>Create an Application Load Balancer in your public subnets. Configure listeners for HTTP on port 80 or HTTPS on port 443. For production, always use HTTPS with an ACM certificate.</p><p>Create a target group with the target type set to IP. This is required for Fargate. Configure health checks to verify your application is responding correctly.</p><p>Set the health check path to an endpoint your application exposes. Configure appropriate thresholds and intervals based on how quickly your application starts.</p><h3>Step 9: Create the ECS Service</h3><p>The service maintains the desired number of tasks and integrates with the load balancer.</p><p>Navigate to your cluster and create a new service. Select the task definition you created earlier.</p><p>Configure the following:</p><p><strong>Launch type</strong>: Fargate.</p><p><strong>Desired count</strong>: The number of tasks you want running. Start with two for high availability.</p><p><strong>Deployment configuration</strong>: Use rolling updates. Set minimum healthy percent to 100 and maximum percent to 200. This ensures zero downtime during deployments.</p><p><strong>Network configuration</strong>: Select your VPC, private subnets, and the security group for tasks. Enable auto-assign public IP only if tasks are in public subnets.</p><p><strong>Load balancing</strong>: Select the Application Load Balancer and target group you created.</p><p><strong>Service discovery</strong>: Optionally enable Cloud Map integration for internal service-to-service communication.</p><h3>Step 10: Configure Auto Scaling</h3><p>Your application should scale based on demand. ECS Service Auto Scaling adjusts the desired count automatically.</p><p>Configure target tracking scaling policies. Common metrics include:</p><p><strong>CPU utilization</strong>: Scale when average CPU across tasks exceeds a threshold.</p><p><strong>Memory utilization</strong>: Scale when memory usage grows.</p><p><strong>Request count per target</strong>: Scale based on how many requests each task handles.</p><p>Set minimum and maximum task counts. The minimum ensures you always have capacity. The maximum prevents runaway scaling and unexpected costs.</p><h3>Step 11: Set Up Logging and Monitoring</h3><p>Observability is critical for production workloads.</p><p>Your tasks should already send logs to CloudWatch Logs based on the task definition configuration. Create CloudWatch alarms for key metrics such as CPU utilization, memory utilization, and running task count.</p><p>Enable Container Insights on your cluster for enhanced metrics. This provides task-level and container-level visibility.</p><p>Set up CloudWatch dashboards to visualize your application health at a glance.</p><h3>Step 12: Verify the Deployment</h3><p>Once the service is running, verify everything works correctly.</p><p>Check the ECS console. Your service should show the desired number of running tasks. If tasks are failing, check the stopped tasks for error messages.</p><p>Check the target group health. All registered targets should be healthy. If not, verify security group rules and health check configuration.</p><p>Access your application through the load balancer DNS name. Verify it responds as expected.</p><h2>Deployment Flow for Updates</h2><p>Once your initial deployment is complete, updating your application follows this flow:</p><ol><li><p>Make changes to your application code</p></li><li><p>Build a new container image with a new tag</p></li><li><p>Push the image to ECR</p></li><li><p>Create a new task definition revision pointing to the new image</p></li><li><p>Update the ECS service to use the new task definition</p></li><li><p>ECS performs a rolling deployment automatically</p></li><li><p>Old tasks drain connections and stop</p></li><li><p>New tasks start and register with the load balancer</p></li></ol><p>This process can be automated with CI/CD pipelines using services like CodePipeline, GitHub Actions, or GitLab CI.</p><h2>Best Practices</h2><p><strong>Keep Images Lean</strong></p><p>Smaller images deploy faster. Use minimal base images. Remove unnecessary files and dependencies.</p><p><strong>Use Specific Image Tags</strong></p><p>Never use the latest tag in production. Always use specific versions so deployments are predictable and rollbacks are possible.</p><p><strong>Implement Health Checks</strong></p><p>Define health checks at both the container level and load balancer level. This ensures traffic only routes to healthy tasks.</p><p><strong>Store Secrets Securely</strong></p><p>Never bake secrets into container images. Use Secrets Manager or Parameter Store. Reference them in your task definition and inject them as environment variables at runtime.</p><p><strong>Plan for Failure</strong></p><p>Run tasks across multiple availability zones. Set appropriate desired counts. Configure auto scaling to handle load spikes.</p><p><strong>Monitor Costs</strong></p><p>Fargate pricing is based on CPU and memory. Right-size your task definitions. Use Spot capacity for fault-tolerant workloads to reduce costs significantly.</p><h2>Common Pitfalls to Avoid</h2><p><strong>Incorrect security group rules</strong>: Tasks cannot pull images or register with load balancers if security groups block traffic.</p><p><strong>Missing NAT Gateway</strong>: Tasks in private subnets cannot reach ECR or other AWS services without a NAT Gateway or VPC endpoints.</p><p><strong>Health check misconfiguration</strong>: If the health check path returns errors or times out, tasks will be marked unhealthy and constantly replaced.</p><p><strong>Insufficient task role permissions</strong>: Your application will fail to access AWS services if the task role lacks required permissions.</p><p><strong>Forgetting to update the service</strong>: Creating a new task definition revision does not automatically deploy it. You must update the service.</p><h2>Conclusion</h2><p>Deploying containers on ECS with Fargate removes the burden of managing servers while giving you full control over your application. The initial setup involves several components, but each serves a clear purpose.</p><p>Start with a simple deployment. Get it working end to end. Then iterate by adding auto scaling, improving monitoring, and building CI/CD pipelines.</p><p>The serverless nature of Fargate lets you focus on your application rather than infrastructure. You pay only for the resources your containers use, and scaling happens automatically based on your policies.</p><p>Now you have a clear roadmap. Pick your application and start building.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Serverless patterns for managing AWS Lambda at scale (from my re:Invent 2025 talk)]]></title><description><![CDATA[As serverless applications scale, managing hundreds of AWS Lambda functions becomes increasingly complex.]]></description><link>https://blog.thecloudengineers.com/p/serverless-patterns-for-managing</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/serverless-patterns-for-managing</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 28 Jan 2026 10:30:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/10a78211-c450-4cda-b62c-4e0b433e05f5_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As serverless applications scale, managing hundreds of AWS Lambda functions becomes increasingly complex. After presenting this topic twice at AWS re:Invent 2025 to excellent reception, I wanted to share these insights more broadly. This article presents three proven architectural patterns that help teams organize Lambda functions at scale while reducing operational overhead.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Introduction</h2><p>Building a serverless platform often starts with just a few AWS Lambda functions. Over time, however, as new features are added and the platform evolves, that number can quickly grow into the hundreds. At that scale, managing, deploying, and maintaining Lambda functions becomes increasingly complex.</p><p>This is a common challenge organizations face as they scale serverless architectures. As the number of Lambda functions increases, so does the operational overhead. One of the most frequent questions we hear from customers is:</p><p><strong>&#8220;How should we properly structure and organize a large number of Lambda functions?&#8221;</strong></p><p>In this article, we present three practical architectural patterns that help teams organize Lambda functions at scale, enabling them to grow serverless applications while keeping complexity under control.</p><h2>The Starting Point: Single-Purpose Lambda Functions</h2><p>When organizations first adopt AWS Lambda, they typically follow a straightforward approach: one function per responsibility. We refer to these as <em>single-purpose Lambda functions</em>.</p><p>Consider a typical e-commerce architecture:</p><ul><li><p><strong>GetProduct</strong> &#8211; Retrieve product details</p></li><li><p><strong>AddCart</strong> &#8211; Add items to a shopping cart</p></li><li><p><strong>RemoveCart</strong> &#8211; Remove items from the cart</p></li><li><p><strong>GetCart</strong> &#8211; View the cart</p></li><li><p><strong>SubmitOrder</strong> &#8211; Submit an order</p></li><li><p><strong>ReserveInventory</strong> &#8211; Reserve inventory</p></li><li><p><strong>ProcessPayment</strong> &#8211; Process payment</p></li><li><p><strong>FulfilOrder</strong> &#8211; Complete the order</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h8bJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h8bJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png" width="1456" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207412,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/186058233?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h8bJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Amazon API Gateway acts as the front door, exposing HTTP endpoints and routing requests to the appropriate Lambda functions. Each function persists its data in a dedicated DynamoDB table.</p><p>This approach is perfectly valid and widely used. However, as organizations add features and scale their platforms, they often end up with hundreds or even thousands of Lambda functions in a single repository.</p><p>At scale, this typically results in:</p><ul><li><p>A large, monolithic repository</p></li><li><p>A single, complex CI/CD pipeline</p></li><li><p>High deployment risk, where a failure impacts the entire system</p></li></ul><p>To address these challenges, let&#8217;s explore three organizational patterns.</p><h2>Pattern 1: Microservices</h2><p>When all Lambda functions reside in a single repository backed by one AWS SAM template, complexity escalates quickly. Deployments become slow and risky, and ownership becomes unclear.</p><p>The solution is to apply <strong>microservices principles and bounded contexts</strong> to your serverless architecture.</p><p>Consider the cart-related functions: <code>AddCart</code>, <code>RemoveCart</code>, and <code>GetCart</code>. These functions:</p><ul><li><p>Operate on the same domain (shopping cart)</p></li><li><p>Share the same data model and DynamoDB table</p></li><li><p>Are typically owned by the same team</p></li></ul><p>These functions clearly belong together. Instead of managing them independently, group them into a dedicated service, such as a <code>cart-service</code>.</p><p>Repeat this process for other domains, including product, order, inventory, and payment. Each domain becomes its own service with:</p><ul><li><p>A dedicated repository</p></li><li><p>Its own AWS SAM template</p></li><li><p>An independent CI/CD pipeline</p></li></ul><p>In a true microservices architecture, services expose functionality through APIs rather than sharing infrastructure resources directly. As a result, SAM templates are typically self-contained and do not reference resources across services.</p><h3>Benefits</h3><p>This approach enables:</p><ul><li><p><strong>Independent development and deployment</strong> &#8211; Teams can ship changes without impacting unrelated services</p></li><li><p><strong>Clear ownership</strong> &#8211; Each service has a well-defined boundary and responsible team</p></li><li><p><strong>Reduced coupling</strong> &#8211; Services only include the dependencies they require</p></li></ul><p>This is the foundation for scaling serverless systems both technically and organizationally.</p><h2>Pattern 2: Lambda Web Adapter</h2><p>Even with improved organization, teams may still manage a large number of Lambda functions. In many cases, these functions can be consolidated.</p><p>This is where the <a href="https://github.com/awslabs/aws-lambda-web-adapter">Lambda Web Adapter pattern </a>becomes valuable. Instead of creating one Lambda function per HTTP endpoint, you run a traditional web framework inside a single Lambda function and handle routing internally.</p><p>For example, <code>AddCart</code> and <code>RemoveCart</code> both update the same DynamoDB table and operate within the same domain. These can be combined into a single <code>manage_cart</code> function.</p><p>Inside that function, you can use familiar frameworks such Express.js, Next.js, Flask, Django, Spring Boot, ASP.NET, Laravel, and so on. The Lambda Web Adapter makes this possible.</p><h3>Benefits</h3><p>This pattern offers several advantages:</p><ul><li><p><strong>Fewer cold starts</strong> &#8211; A single function replaces multiple endpoints</p></li><li><p><strong>Shared initialization</strong> &#8211; SDK clients and database connections are initialized once</p></li><li><p><strong>Familiar development model</strong> &#8211; Teams can use established web frameworks</p></li><li><p><strong>Simpler testing</strong> &#8211; Routing and logic can be tested locally without mocking Lambda events</p></li></ul><h3>A Word of Caution</h3><p>Consolidation should be applied carefully. Overuse can result in a monolithic Lambda function that negates the benefits of serverless design.</p><p>Keep functions aligned to a bounded context, and continuously monitor:</p><ul><li><p>Function size</p></li><li><p>Memory usage</p></li><li><p>Execution time</p></li></ul><p>Remember that a failure in a consolidated function can impact multiple operations.</p><h3>When to Consolidate Functions</h3><p>Consolidation is appropriate when functions:</p><ul><li><p>Share common dependencies or downstream services</p></li><li><p>Have similar memory and performance requirements</p></li><li><p>Share IAM permissions</p></li><li><p>Have comparable initialization costs</p></li></ul><p>If these factors align, consolidation is often beneficial.</p><h2>Pattern 3: API Gateway Direct Integration</h2><p>A common anti-pattern in serverless architectures is the use of Lambda functions that simply proxy requests to downstream services without applying any business logic.</p><p>For example, the <code>GetCart</code> function that retrieves an item from DynamoDB and returns it without transformation adds little value.</p><p>In such cases, the Lambda function can be removed entirely and replaced with an <strong>API Gateway direct integration</strong>.</p><p>With direct integrations, API Gateway communicates directly with AWS services such as DynamoDB or Step Functions, and others, eliminating the need for an intermediate Lambda function.</p><h3>Benefits</h3><p>This pattern provides:</p><ul><li><p><strong>Lower cost</strong> &#8211; No Lambda invocation charges</p></li><li><p><strong>Lower latency</strong> &#8211; One less hop in the request path</p></li><li><p><strong>No cold starts</strong> &#8211; API Gateway is always available</p></li><li><p><strong>Reduced operational overhead</strong> &#8211; Fewer functions to manage</p></li></ul><h3>When to Use This Pattern</h3><p>Use direct integrations when:</p><ul><li><p>Performing simple CRUD operations</p></li><li><p>No business logic or complex validation is required</p></li><li><p>The goal is to minimize Lambda usage</p></li></ul><p>Avoid this pattern when:</p><ul><li><p>Complex transformations are needed</p></li><li><p>Business logic or orchestration is involved</p></li><li><p>Advanced error handling is required</p></li></ul><h2>Recap</h2><p>To organize Lambda functions effectively at scale, consider the following patterns:</p><ol><li><p><strong>Microservices Organization</strong> &#8211; Group related functions by domain into independent services and repositories</p></li><li><p><strong>Lambda Web Adapter</strong> &#8211; Consolidate related functions inside a single Lambda function</p></li><li><p><strong>API Gateway Direct Integration</strong> &#8211; Eliminate unnecessary Lambda functions for simple operations</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5zpJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5zpJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png" width="1456" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:247577,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/186058233?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5zpJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Applying these patterns allows teams to move from:</p><ul><li><p>A single repository with scattered Lambda functions</p></li><li><p>To a domain-driven, multi-repository architecture</p></li><li><p>To fewer, more meaningful Lambda functions</p></li><li><p>And, where appropriate, to no Lambda functions at all</p></li></ul><h2>Conclusion</h2><p>Organizing Lambda functions at scale does not need to be overwhelming. By grouping functions by domain, consolidating where appropriate, and leveraging API Gateway direct integrations, you can transform an unmanageable serverless environment into a clean, scalable, and maintainable architecture.</p><p>Start by reviewing your existing Lambda landscape. Identify shared domains, look for consolidation opportunities, and evaluate where direct integrations can reduce unnecessary compute. These incremental changes can significantly improve both developer productivity and operational efficiency.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Serverless Contact Form with API Gateway, Lambda, and SES: A Complete Guide]]></title><description><![CDATA[Static sites have become incredibly popular for portfolios, landing pages, and business websites.]]></description><link>https://blog.thecloudengineers.com/p/serverless-contact-form-with-api</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/serverless-contact-form-with-api</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 21 Jan 2026 10:48:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e628674b-1c37-4daf-83e1-cbf5319b323b_3840x2160.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Static sites have become incredibly popular for portfolios, landing pages, and business websites. Services like S3 static hosting and CloudFront make deployment simple and cheap. However, there is one common challenge: handling user input.</p><p>A contact form seems straightforward until you remember that static sites have no server to process form submissions. You need somewhere to receive the data, validate it, and do something useful with it.</p><p>This article walks through building a serverless contact form backend using AWS services. By the end, you will have a solution that receives form submissions, sends email notifications, and costs nearly nothing to operate.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Architecture Overview</h2><p>The serverless contact form backend uses three core AWS services working together.</p><p><strong>Amazon API Gateway</strong> provides an HTTPS endpoint that your frontend form can call. It receives the POST request containing the form data and passes it along for processing.</p><p><strong>AWS Lambda</strong> runs your backend logic. It receives the form data from API Gateway, validates the input, formats an email, and triggers the email delivery. You pay only when the function executes.</p><p><strong>Amazon Simple Email Service (SES)</strong> handles email delivery. Lambda tells SES what to send and where, and SES manages the actual delivery, retries, and bounce handling.</p><p>The flow works like this. A visitor fills out your contact form and clicks submit. JavaScript on your static site sends a POST request to your API Gateway endpoint. API Gateway triggers your Lambda function with the form data. Lambda validates the input and calls SES to send an email. SES delivers the email to your inbox. Lambda returns a success response through API Gateway to your frontend. Your static site displays a thank you message.</p><p>This entire process typically completes in under one second and costs a fraction of a cent.</p><h2>Setting Up Amazon SES</h2><p>Start with SES because email delivery requires verification before you can use it.</p><p><strong>Understanding the SES Sandbox</strong></p><p>New AWS accounts start with SES in sandbox mode. This is a restricted environment that prevents abuse. In sandbox mode, you can only send emails to verified email addresses. This is fine for testing but not for production.</p><p>To exit sandbox mode, you submit a request through the AWS console. AWS reviews your use case and typically approves legitimate requests within 24 hours. You do not need production access to build and test your contact form, but you will need it before going live.</p><p><strong>Verifying Your Email Address</strong></p><p>Navigate to SES in the AWS console and go to verified identities. Add the email address where you want to receive contact form submissions. AWS sends a verification email with a link you must click.</p><p>For sending emails, you have two options. You can verify a specific email address that will appear in the From field. Alternatively, you can verify an entire domain, which allows sending from any address at that domain.</p><p>Domain verification requires adding DNS records to prove ownership. If you manage your domain through Route 53, AWS can add these records automatically. For external DNS providers, you manually add the TXT records AWS provides.</p><p><strong>Choosing a Region</strong></p><p>SES is not available in all AWS regions. Check which regions support SES and choose one close to your users or in the same region as your other resources. Common choices include us-east-1, us-west-2, and eu-west-1.</p><p>Remember that your Lambda function must call SES in the region where you verified your email addresses.</p><h2>Creating the AWS Lambda Function</h2><p>The Lambda function is the brain of your contact form backend. It receives form submissions, validates them, and orchestrates email delivery.</p><p><strong>Function Design</strong></p><p>Keep the function focused on a single responsibility. It should accept form data, validate required fields exist and contain reasonable values, format a readable email, call SES to send it, and return an appropriate response.</p><p>Resist the temptation to add features like database storage or complex analytics in this initial version. Start simple and extend later if needed.</p><p><strong>Input Validation</strong></p><p>Never trust user input. Your function should verify that required fields are present before processing. Check that email addresses look roughly valid. Enforce reasonable length limits on message content. Strip or escape any potentially dangerous content.</p><p>Validation protects you from malformed requests and reduces the chance of processing garbage data or abuse attempts.</p><p><strong>Email Formatting</strong></p><p>Consider what information you want in the notification email. Typically you include the sender name, their email address for replies, the message content, and a timestamp. You might also include metadata like which page the form was submitted from.</p><p>Format the email so it is easy to scan. A wall of text is harder to process than clearly labeled sections.</p><p><strong>Error Handling</strong></p><p>Your function should handle failures gracefully. SES might be temporarily unavailable. The input might be malformed. Network issues happen.</p><p>Return clear error responses that help your frontend display appropriate messages. Log errors with enough detail to debug problems later. Avoid exposing internal error details to users as this can reveal information useful to attackers.</p><p><strong>IAM Permissions</strong></p><p>Your Lambda function needs permission to call SES. Create an IAM role with the minimum required permissions. The function needs ses:SendEmail for the specific verified identities you are using. It does not need full SES access.</p><p>Follow the principle of least privilege. Grant only what the function needs to operate.</p><h2>Configuring Amazon API Gateway</h2><p>API Gateway creates the public endpoint that your frontend calls.</p><p><strong>Choosing an API Type</strong></p><p>API Gateway offers two main options for this use case.</p><p>REST APIs are the traditional option with more features and configuration options. They work well and have extensive documentation.</p><p>HTTP APIs are newer, cheaper, and faster. They have fewer features but everything you need for a simple contact form. For most contact form backends, HTTP APIs are the better choice.</p><p><strong>Creating the Endpoint</strong></p><p>Create a new API and add a single route. You need a POST method at a path like /contact or /submit. Connect this route to your Lambda function.</p><p>Configure the integration so API Gateway passes the request body to Lambda. The default settings usually work fine for JSON payloads.</p><p><strong>Enabling CORS</strong></p><p>Cross-Origin Resource Sharing is essential when your frontend and backend are on different domains. Your static site at yourdomain.com needs permission to call your API at some-id.execute-api.region.amazonaws.com.</p><p>Configure CORS on your API Gateway to allow requests from your static site domain. Specify the allowed origin, allowed methods (POST and OPTIONS), and allowed headers (Content-Type at minimum).</p><p>Without proper CORS configuration, browsers block the request and your form silently fails.</p><p><strong>Deploying the API</strong></p><p>API Gateway requires deployment to a stage before your endpoint becomes accessible. Create a stage named prod or live. After deployment, you receive an invoke URL that your frontend uses.</p><p>Save this URL. You configure your frontend form to send submissions here.</p><h2>Connecting Your Frontend</h2><p>With the backend complete, update your static site to use it.</p><p><strong>Form Structure</strong></p><p>Your HTML form needs fields for whatever information you want to collect. Common fields include name, email, and message. Add any other fields relevant to your use case.</p><p>The form does not need an action attribute or traditional submit behavior. JavaScript handles the submission.</p><p><strong>JavaScript Submission</strong></p><p>Write JavaScript that intercepts the form submission, prevents the default browser behavior, collects the form field values, and sends them to your API endpoint as a JSON POST request.</p><p>Handle the response appropriately. On success, show a thank you message and clear the form. On failure, display an error message asking the user to try again or contact you directly.</p><p><strong>User Experience Considerations</strong></p><p>Disable the submit button while the request is processing to prevent duplicate submissions. Show a loading indicator so users know something is happening. Provide clear feedback on both success and failure.</p><p>Consider what happens if JavaScript fails to load or the user has it disabled. Progressive enhancement with a fallback email link improves accessibility.</p><h2>Security Considerations</h2><p>A public form endpoint attracts abuse. Plan for it.</p><p><strong>Rate Limiting</strong></p><p>Configure throttling on your API Gateway to limit requests per second. This prevents a single source from overwhelming your backend or running up your AWS bill.</p><p>Start with conservative limits. A legitimate contact form rarely receives more than a few submissions per minute. You can always increase limits if needed.</p><p><strong>Spam Prevention</strong></p><p>Bots constantly scan the internet for forms to abuse. Several techniques help reduce spam.</p><p>Honeypot fields are hidden form fields that humans never see or fill out. Bots often fill every field they find. If the honeypot field contains data, reject the submission.</p><p>Time-based validation rejects submissions that happen too quickly. A human takes at least several seconds to fill out a form. A bot submits instantly. Record when the page loaded and reject submissions that arrive suspiciously fast.</p><p>CAPTCHA services like reCAPTCHA add a challenge that bots struggle to complete. They add friction for users but dramatically reduce automated spam.</p><p>Consider implementing at least one of these techniques from the start. Cleaning up after a spam attack is more work than preventing it.</p><p><strong>Input Sanitization</strong></p><p>Sanitize all input before including it in emails. This prevents injection attacks and ensures your emails render correctly. Be especially careful with HTML content if you send HTML-formatted emails.</p><h2>Cost Analysis</h2><p>Serverless pricing aligns well with contact form usage patterns.</p><p><strong>Expected Costs</strong></p><p>API Gateway HTTP APIs cost one dollar per million requests. Lambda charges based on execution time and memory, with a generous free tier of one million requests per month. SES costs ten cents per thousand emails.</p><p>For a typical small business website receiving a few hundred contact form submissions monthly, the entire backend costs pennies. Many sites stay within free tier limits indefinitely.</p><p><strong>Cost Controls</strong></p><p>Set up billing alerts to notify you of unexpected charges. A sudden spike in requests could indicate abuse or a misconfigured frontend.</p><p>The rate limiting configured earlier also protects your budget. Even if someone attempts to abuse your endpoint, throttling limits how many requests actually process.</p><h2>Testing Your Setup</h2><p>Test thoroughly before announcing your new contact form.</p><p><strong>Manual Testing</strong></p><p>Submit test entries through your actual form. Verify emails arrive with correct formatting. Test with various input lengths and special characters. Confirm error cases display appropriate messages.</p><p><strong>Edge Cases</strong></p><p>Test what happens with empty required fields. Submit extremely long messages. Include special characters and emoji. Try submitting while offline. Each test might reveal handling you need to improve.</p><p><strong>Monitoring</strong></p><p>After launch, monitor your Lambda function logs for errors. Watch SES for bounces or complaints. Set up CloudWatch alarms to alert you if error rates spike.</p><h2>Going Further</h2><p>Once your basic contact form works, consider enhancements.</p><p>Store submissions in DynamoDB for a searchable archive. Add SNS to send notifications through multiple channels. Implement different notification routing based on form fields. Build an admin dashboard to view and manage submissions.</p><p>Each enhancement adds complexity. Add them only when you have a clear need.</p><h2>Conclusion</h2><p>A serverless contact form backend solves a real problem elegantly. You get reliable form processing without maintaining servers, pay only for actual usage, and keep full control over your data.</p><p>The combination of API Gateway, Lambda, and SES provides a robust foundation that scales from zero to thousands of submissions without configuration changes. For most static sites, this backend costs nothing or nearly nothing to operate.</p><p>Build it once, deploy it, and focus on the rest of your site knowing that contact form submissions will reliably reach your inbox.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[10 Things I Wish I Knew Before Starting My Cloud Journey]]></title><description><![CDATA[Eleven years ago, I was staring at the AWS console for the first time, completely overwhelmed.]]></description><link>https://blog.thecloudengineers.com/p/10-things-i-wish-i-knew-before-starting</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/10-things-i-wish-i-knew-before-starting</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 14 Jan 2026 10:30:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f418d0c5-d160-4bc2-87a7-4cb3d112f319_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Eleven years ago, I was staring at the AWS console for the first time, completely overwhelmed. There were dozens of services I&#8217;d never heard of, acronyms that meant nothing to me, and a nagging voice in my head saying I wasn&#8217;t smart enough to figure this out.</p><p>Today, I work with cloud infrastructure daily. I&#8217;ve built serverless applications, designed architectures for production workloads, and helped others navigate the same confusion I once felt.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Looking back, I realize the technical skills were the easy part. The harder lessons were about mindset, consistency, and understanding how careers actually work. The things nobody tells you when you&#8217;re starting out.</p><p>If I could sit down with my past self on Day 1, here&#8217;s what I&#8217;d say.</p><h2>1. Stop Waiting to Feel Ready</h2><p>You will never feel ready.</p><p>I spent months telling myself I needed to finish one more course before I could start building. I needed to understand networking better before I touched AWS. I needed to be more comfortable with Linux before I could deploy anything.</p><p>This is a trap. Readiness is a feeling that comes after you start, not before.</p><p>The first time I tried to deploy a simple web application to EC2, I had no idea what I was doing. I didn&#8217;t understand security groups. I couldn&#8217;t figure out why my instance wasn&#8217;t reachable. I spent an entire weekend troubleshooting something that should have taken an hour.</p><p>But here&#8217;s what happened: by Monday, I understood security groups. Not because I read about them, but because I had broken something and needed to fix it.</p><p>Your first project will confuse you. Your second will frustrate you. Your third will still have problems. But by your fifth project, something shifts. You start recognizing patterns. Error messages that once looked like gibberish begin making sense. You develop instincts.</p><p>That confidence you&#8217;re waiting for? It&#8217;s on the other side of starting.</p><p>The course you&#8217;re watching right now is valuable. But it becomes ten times more valuable when you pause it, open your AWS console, and try to build what you just learned. Understanding and doing are completely different skills. You need both, and you can only get the second one by starting before you&#8217;re ready.</p><h2>2. Projects Matter More Than Everything Else</h2><p>Let me be direct: not one certification, not one course, not one bootcamp did more for my career than building real projects.</p><p>I&#8217;m not saying certifications are worthless. They have their place. They give you structured learning paths, validate your knowledge to employers who use them as filters, and force you to learn topics you might otherwise skip.</p><p>But here&#8217;s what certifications can&#8217;t do: they can&#8217;t teach you what happens when things break in unexpected ways. They can&#8217;t give you the experience of debugging a problem for three hours only to discover it was a typo in an environment variable. They can&#8217;t simulate the feeling of deploying something to production and watching real users interact with it.</p><p>Projects give you stories to tell in interviews. When someone asks about a challenging problem you solved, you need a real answer. &#8220;I studied this topic in a course&#8221; doesn&#8217;t compare to &#8220;I built an application that processed user uploads, and when it started failing at scale, I had to redesign the architecture to handle the load.&#8221;</p><p>Here&#8217;s my framework for learning through projects:</p><p><strong>Build something.</strong> It doesn&#8217;t need to be original or impressive. Clone an existing idea. Build a to-do app, a URL shortener, a simple API. The goal isn&#8217;t to create something the world has never seen, it&#8217;s to learn by doing.</p><p><strong>Break it.</strong> Intentionally push your project to its limits. What happens when you send a thousand requests at once? What happens when the database goes down? What happens when you feed it unexpected input? Breaking things teaches you how they work.</p><p><strong>Fix it.</strong> This is where the real learning happens. Debugging forces you to understand systems at a deeper level than building ever does. Every error you resolve adds to your mental library of solutions.</p><p><strong>Repeat.</strong> Each project should be slightly more ambitious than the last. Add a new service you haven&#8217;t used. Implement a pattern you read about. Increase the complexity gradually.</p><p>Document everything as you go. Take screenshots. Write notes about what you tried and what worked. This documentation becomes portfolio material, blog content, and interview preparation all at once.</p><h2>3. Your Future Network Matters More Than Your Current Skills</h2><p>When I started in tech, I thought success was purely about technical ability. Learn enough skills, pass enough certifications, and opportunities would appear.</p><p>I was wrong.</p><p>The opportunities that changed my career came through people. A former colleague mentioned my name when their company was hiring. Someone I&#8217;d interacted with on LinkedIn sent me a message about a role they thought I&#8217;d be perfect for. A connection I&#8217;d helped with a technical question months earlier returned the favor with an introduction.</p><p>Your network isn&#8217;t just about who you know, it&#8217;s about who knows what you&#8217;re capable of.</p><p>This doesn&#8217;t mean you need to become a networking expert or attend every meetup in your city. It means being visible and helpful in small, consistent ways.</p><p><strong>Comment on posts.</strong> When someone shares something interesting, add a thoughtful comment. Not &#8220;Great post!&#8221; but something that shows you actually read and thought about it. Ask a follow-up question. Share a related experience. This puts you on people&#8217;s radar.</p><p><strong>Post your progress.</strong> You don&#8217;t need to be an expert to share what you&#8217;re learning. &#8220;Today I finally understood how IAM policies work&#8221; is valuable content. Someone else is struggling with the same thing and will appreciate your explanation. Someone more experienced might offer additional insights. Both outcomes benefit you.</p><p><strong>Ask questions publicly.</strong> Many people are afraid to ask questions because they don&#8217;t want to look uninformed. But asking good questions demonstrates curiosity and engagement. It also invites helpful people into your orbit.</p><p><strong>Help others.</strong> Answer questions when you can. Share resources you&#8217;ve found useful. Celebrate other people&#8217;s wins. Generosity in professional communities compounds over time in ways that are hard to predict but impossible to ignore.</p><p>The cloud community is remarkably welcoming to beginners. People remember what it was like to start out and genuinely want to help. But they can only help you if they know you exist.</p><h2>4. LinkedIn Isn&#8217;t Optional</h2><p>I resisted LinkedIn for years. It felt performative. Self-promotional. Uncomfortable.</p><p>Then I realized something: LinkedIn isn&#8217;t social media in the traditional sense. It&#8217;s infrastructure for your career.</p><p>Think about what LinkedIn actually is. It&#8217;s your resume, but dynamic and always accessible. It&#8217;s your portfolio, where you can showcase projects and share your work. It&#8217;s your reputation, built through your posts, comments, and endorsements. It&#8217;s your discoverability, where recruiters and hiring managers search for candidates.</p><p>When a recruiter searches for &#8220;Cloud Engineer,&#8221; they&#8217;re not searching your personal website. They&#8217;re searching LinkedIn. When a hiring manager wants to learn more about you before an interview, they&#8217;re checking LinkedIn. When someone you met at a conference wants to stay in touch, they&#8217;re connecting on LinkedIn.</p><p>You don&#8217;t need to become a LinkedIn influencer. You don&#8217;t need to post every day or chase viral content. But you do need to:</p><p><strong>Complete your profile.</strong> A professional photo, a headline that describes what you do, and a summary that tells your story. List your projects, certifications, and relevant experience. Make it easy for people to understand who you are and what you&#8217;re looking for.</p><p><strong>Show your work.</strong> When you complete a project, write about it. When you pass a certification, share what you learned. When you solve an interesting problem, explain your approach. These posts serve as evidence of your capabilities.</p><p><strong>Engage consistently.</strong> Even ten minutes a day makes a difference. Comment on a few posts. Congratulate people on their achievements. Share an article you found valuable. Consistency matters more than volume.</p><p>People can&#8217;t help you if they can&#8217;t see you. LinkedIn makes you visible.</p><h2>5. Momentum Beats Perfection</h2><p>You&#8217;re going to overthink your first GitHub repository. You&#8217;ll worry the code isn&#8217;t clean enough, the documentation isn&#8217;t thorough enough, the project isn&#8217;t impressive enough.</p><p>You&#8217;re going to overthink your first LinkedIn post. You&#8217;ll rewrite it five times, wonder if anyone will care, and debate whether you should even hit publish.</p><p>You&#8217;re going to overthink your first job application. You&#8217;ll convince yourself you&#8217;re not qualified, that you&#8217;ll embarrass yourself in the interview, that you should wait until you&#8217;re more prepared.</p><p>Here&#8217;s what I wish someone had told me: done is better than perfect, and published is better than polished.</p><p>The first version of anything is supposed to be rough. That&#8217;s not a flaw, it&#8217;s the process. You can&#8217;t edit something that doesn&#8217;t exist. You can&#8217;t improve a project you never built. You can&#8217;t get feedback on a post you never published.</p><p>Every expert you admire has a graveyard of mediocre first attempts. The difference between them and people who never made it isn&#8217;t talent, it&#8217;s that they kept going.</p><p>Momentum has a compounding effect. Your first repository teaches you how Git works. Your second teaches you about documentation. Your third teaches you about code organization. Each iteration improves because you&#8217;re learning by doing, not waiting to learn everything before you start.</p><p>The same applies to content. Your first LinkedIn post might get three likes. Your tenth will be better because you&#8217;ve learned what resonates with your audience. Your fiftieth will feel natural because you&#8217;ve developed your voice through practice.</p><p>Perfection is a form of procrastination. It feels productive because you&#8217;re &#8220;working on&#8221; something, but nothing is actually shipping. Momentum gets you results.</p><p>Still hit publish. Still apply. Still show up.</p><h2>6. Interviews Aren&#8217;t Exams - They&#8217;re Conversations</h2><p>I failed my first cloud interview badly. I treated it like a certification exam, trying to recall memorized facts about AWS services. When the interviewer asked about my experience with a service I&#8217;d never used, I froze. I didn&#8217;t know the &#8220;right answer.&#8221;</p><p>Here&#8217;s what I didn&#8217;t understand: interviews aren&#8217;t about having all the answers. They&#8217;re about demonstrating how you think.</p><p>Hiring managers know you won&#8217;t have experience with every service. They know you&#8217;ll encounter problems you&#8217;ve never seen before. What they want to know is how you&#8217;ll handle those situations.</p><p>When you don&#8217;t know something, the worst response is pretending you do. The best response is showing your thought process: &#8220;I haven&#8217;t used that service directly, but based on what I know about similar services, I&#8217;d approach it by...&#8221;</p><p>Interviewers are evaluating several things at once:</p><p><strong>How you think.</strong> Can you break down complex problems? Do you ask clarifying questions? Can you reason through unfamiliar territory?</p><p><strong>How you communicate.</strong> Can you explain technical concepts clearly? Do you listen to questions carefully? Can you admit when you don&#8217;t know something?</p><p><strong>How you solve problems.</strong> What&#8217;s your approach when you&#8217;re stuck? How do you prioritize when there are multiple possible solutions? How do you validate that your solution actually works?</p><p><strong>How you&#8217;d be to work with.</strong> Are you collaborative? Do you seem coachable? Would the team enjoy having you in meetings and Slack channels?</p><p>Prepare for interviews by practicing how you talk about your experience, not just what you know. Use the STAR method (Situation, Task, Action, Result) to structure stories about projects you&#8217;ve worked on. Practice explaining technical decisions you&#8217;ve made and why you made them.</p><p>The interview is a conversation about whether you and the company are a good fit for each other. You&#8217;re evaluating them just as much as they&#8217;re evaluating you. That mindset shift alone will make you more relaxed and more authentic.</p><h2>7. The Cloud Isn&#8217;t Hard - It&#8217;s Unfamiliar</h2><p>When I first opened the AWS console, I was genuinely intimidated. There were over two hundred services. The documentation was dense. Every tutorial seemed to assume knowledge I didn&#8217;t have.</p><p>I thought the problem was that cloud computing was inherently difficult and maybe I wasn&#8217;t cut out for it.</p><p>I was wrong. The problem wasn&#8217;t difficulty, it was unfamiliarity.</p><p>There&#8217;s an enormous difference between hard and unfamiliar. Hard means something is objectively complex, requiring exceptional intelligence or talent to understand. Unfamiliar just means you haven&#8217;t been exposed to it yet.</p><p>Most of cloud computing is unfamiliar, not hard.</p><p>When you first learn about VPCs, subnets, and routing tables, it feels overwhelming. Six months later, you set them up without thinking. The concepts didn&#8217;t get easier, you got familiar with them.</p><p>When you first deploy a Lambda function, you don&#8217;t understand cold starts, execution environments, or IAM roles. After you&#8217;ve deployed fifty functions, these concepts are second nature.</p><p>This reframe matters because it changes your emotional response to confusion. If you believe cloud is hard and you&#8217;re confused, you might conclude you&#8217;re not smart enough. If you believe cloud is unfamiliar and you&#8217;re confused, you know you just need more exposure.</p><p>Give yourself thirty days of consistent learning and building. Not passive learning where you watch videos and nod along, but active learning where you&#8217;re hands-on in the console, making mistakes, and fixing them.</p><p>What feels impossible today becomes your new normal tomorrow. Every expert you admire went through the same process of confusion becoming clarity. The only difference is they stuck with it long enough.</p><h2>8. The Fastest Way to Stand Out Is Going Deep</h2><p>When you&#8217;re starting out, it&#8217;s tempting to learn a little bit of everything. You want to be well-rounded. You want to have an answer for any question that might come up in an interview.</p><p>This is a mistake.</p><p>The market doesn&#8217;t reward generalists who are mediocre at everything. It rewards specialists who are exceptional at something.</p><p>Think about it from a hiring manager&#8217;s perspective. They have a specific problem to solve. Maybe they need someone to build serverless applications. Maybe they need someone to secure their AWS environment. Maybe they need someone to build CI/CD pipelines.</p><p>When they&#8217;re hiring, they want someone who has gone deep in that specific area. Someone who has encountered the edge cases, understands the tradeoffs, and can hit the ground running.</p><p>&#8220;I have experience with Lambda, EC2, S3, RDS, DynamoDB, CloudFront, Route 53, IAM, VPC, ECS, EKS, and Step Functions&#8221; sounds impressive but says nothing about your actual expertise.</p><p>&#8220;I specialize in serverless architectures. I&#8217;ve built event-driven systems with Lambda, designed APIs with API Gateway, and optimized costs for high-throughput workloads&#8221; tells a story. It positions you as the go-to person for a specific type of problem.</p><p>Pick a lane:</p><p><strong>Serverless</strong> if you&#8217;re drawn to event-driven architectures and want to minimize operational overhead.</p><p><strong>Security</strong> if you&#8217;re detail-oriented and interested in protecting systems from threats.</p><p><strong>DevOps/Platform Engineering</strong> if you enjoy building tools and processes that help other developers move faster.</p><p><strong>Data Engineering</strong> if you&#8217;re interested in building pipelines that move and transform large amounts of data.</p><p><strong>Networking</strong> if you want to understand the foundational infrastructure that everything else runs on.</p><p>You can always expand later. But early in your career, depth beats breadth. Master one lane first, then broaden from a position of strength.</p><h2>9. You Don&#8217;t Need to Be the Smartest - Just the One Who Doesn&#8217;t Quit</h2><p>I&#8217;ve watched people who were more naturally talented than me quit because they got frustrated. I&#8217;ve watched people who started after me give up because they weren&#8217;t seeing results fast enough.</p><p>The people who succeed in cloud aren&#8217;t necessarily the smartest or most talented. They&#8217;re the ones who keep showing up.</p><p>This career path has a high dropout rate because the early stages are uncomfortable. You feel confused constantly. Progress seems slow. Imposter syndrome tells you everyone else is figuring this out faster than you are.</p><p>What you don&#8217;t see is that everyone else feels the same way. The difference between those who make it and those who don&#8217;t isn&#8217;t intelligence, it&#8217;s persistence.</p><p>Some practical strategies for not quitting:</p><p><strong>Set small, achievable goals.</strong> &#8220;Become a cloud engineer&#8221; is overwhelming. &#8220;Complete one hands-on tutorial this week&#8221; is manageable. Stack enough small wins and you&#8217;ll look up to find you&#8217;ve made massive progress.</p><p><strong>Track your progress visibly.</strong> Keep a learning journal. Save screenshots of things you&#8217;ve built. Document problems you&#8217;ve solved. When you&#8217;re feeling discouraged, look back at where you started. You&#8217;ve come further than you think.</p><p><strong>Find a community.</strong> Learning alone is hard. Find a Discord server, a study group, or even one other person who&#8217;s on the same journey. Accountability and encouragement make a huge difference.</p><p><strong>Expect plateaus.</strong> Progress isn&#8217;t linear. You&#8217;ll have weeks where everything clicks and weeks where nothing makes sense. The plateaus are normal. Keep going through them.</p><p><strong>Remember your why.</strong> On the hard days, reconnect with the reason you started. Financial security for your family. A more fulfilling career. The ability to build things that matter. Whatever it is, keep it visible.</p><p>The cloud industry is full of people who aren&#8217;t geniuses. They&#8217;re just people who refused to quit.</p><h2>10. Your Life Can Look Completely Different in 6-12 Months</h2><p>This isn&#8217;t motivational fluff. I&#8217;ve seen it happen repeatedly.</p><p>I&#8217;ve seen people go from help desk to cloud engineer in eight months. I&#8217;ve seen people with no technical background land their first AWS role in a year. I&#8217;ve seen people double their salaries by gaining cloud skills and changing companies.</p><p>Six to twelve months is not a long time. But it&#8217;s enough time to completely transform your career if you&#8217;re consistent.</p><p>Think about where you were a year ago. Think about what you&#8217;ve learned since then, even without focused effort. Now imagine what would happen if you spent the next year intentionally building cloud skills, creating projects, growing your network, and positioning yourself for opportunities.</p><p>The compound effect of daily effort is staggering. One hour a day for a year is 365 hours, more than many college courses. Fifteen minutes a day engaging on LinkedIn builds a network of hundreds of relevant connections. One project a month gives you a portfolio of twelve things you&#8217;ve built.</p><p>Most people overestimate what they can do in a week and underestimate what they can do in a year.</p><p>Your future self will thank you or blame you for the decisions you make today. A year from now, you&#8217;ll have either made progress or made excuses. The time will pass regardless.</p><p>It starts with one decision: committing to this.</p><p>Not &#8220;I&#8217;ll try to learn cloud when I have time.&#8221; Not &#8220;I&#8217;ll start seriously studying after this busy period.&#8221; Not &#8220;I&#8217;ll build projects once I feel more confident.&#8221;</p><p>A real commitment. A non-negotiable decision that you&#8217;re going to do this, that you&#8217;re going to keep showing up even when it&#8217;s hard, and that you&#8217;re going to trust the process even when progress feels slow.</p><p>Make that decision today.</p><h2>The Journey Ahead</h2><p>Starting in cloud is challenging. You&#8217;ll face confusion, frustration, and self-doubt. There will be days when you question whether you&#8217;re on the right path.</p><p>But you&#8217;re also starting at the best possible time. Cloud adoption is accelerating. The demand for cloud skills continues to grow. Companies are struggling to find qualified people. The opportunity is real and it&#8217;s not going away.</p><p>The only question is whether you&#8217;ll position yourself to capture it.</p><p>Everything I&#8217;ve shared in this article comes down to a few core principles: start before you&#8217;re ready, learn by building, make yourself visible, choose depth over breadth, and don&#8217;t quit.</p><p>None of this requires exceptional talent. It requires consistency, patience, and the willingness to be uncomfortable while you&#8217;re learning.</p><p>You have everything you need to start. The resources are available. The community is welcoming. The opportunity is waiting.</p><p>Now it&#8217;s on you.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[5 Essential Traits Every Cloud Engineer Should Master in 2026]]></title><description><![CDATA[At AWS re:Invent 2025, Amazon CTO Werner Vogels delivered a powerful keynote about the &#8220;Renaissance Developer&#8221; in the AI era.]]></description><link>https://blog.thecloudengineers.com/p/5-essential-traits-every-cloud-engineer</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/5-essential-traits-every-cloud-engineer</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 07 Jan 2026 10:30:45 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d7dda85e-0f05-4573-a021-28570f6f26b8_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>At AWS re:Invent 2025, Amazon CTO Werner Vogels delivered a powerful keynote about the &#8220;Renaissance Developer&#8221; in the AI era. His vision resonated deeply with me, and after years of building and scaling cloud infrastructure, I&#8217;ve witnessed firsthand how these exact qualities separate good engineers from exceptional ones.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The cloud engineering landscape is transforming rapidly. As we enter 2026, technical skills alone won&#8217;t cut it anymore. The engineers who thrive are those who embody a holistic set of traits that go far beyond knowing how to configure an EC2 instance or write a CloudFormation template.</p><p>Here are the five essential traits every Cloud Engineer must master:</p><h2>1. Cultivate Relentless Curiosity</h2><p>&#8220;We are not what we know, but what we are willing to learn.&#8221;</p><p>The cloud landscape doesn&#8217;t just evolve&#8212;it transforms at breakneck speed. New services launch quarterly, best practices shift, and yesterday&#8217;s cutting-edge architecture becomes today&#8217;s legacy system. Curiosity isn&#8217;t optional; it&#8217;s your competitive advantage.</p><p>The curious engineer doesn&#8217;t wait for formal training. They experiment with new services in sandbox environments, read whitepapers during lunch breaks, and ask &#8220;why&#8221; when everyone else accepts &#8220;that&#8217;s how we&#8217;ve always done it.&#8221; This trait keeps you ahead of the curve and ensures you&#8217;re solving problems with the best tools available, not just the familiar ones.</p><h2>2. Think in Systems, Not Services</h2><p>Building resilient cloud infrastructure requires seeing the forest, not just the trees. It&#8217;s tempting to focus on individual components&#8212;optimizing that Lambda function, tuning that database, securing that API gateway. But true cloud engineering mastery comes from understanding how everything interconnects.</p><p>Systems thinking means considering:</p><p>&#8226; How does this service failure cascade through the architecture?</p><p>&#8226; What are the upstream and downstream dependencies?</p><p>&#8226; How does this design decision impact scalability, cost, and reliability simultaneously?</p><p>&#8226; What are the feedback loops in our system?</p><p>When you think in systems, you design for resilience, anticipate bottlenecks before they occur, and create architectures that scale gracefully under pressure.</p><h2>3. Communicate with Clarity</h2><p>Here&#8217;s an uncomfortable truth: unclear communication causes more production incidents than bad code.</p><p>Whether you&#8217;re documenting architecture decisions, explaining technical trade-offs to non-technical stakeholders, or writing runbooks for on-call engineers, communication is non-negotiable. The best technical solution means nothing if your team can&#8217;t understand it, maintain it, or build upon it.</p><p>Strong communicators:</p><p>&#8226; Write documentation that their future selves will thank them for</p><p>&#8226; Translate complex technical concepts into business value</p><p>&#8226; Create architecture diagrams that tell a story</p><p>&#8226; Ask clarifying questions before making assumptions</p><p>&#8226; Give and receive feedback constructively</p><p>Remember: every line of code you write is communication with future developers. Every architecture decision you make needs to be communicated to stakeholders. Master this skill, and you&#8217;ll reduce mistakes, accelerate onboarding, and build better systems.</p><h2>4. Embrace True Ownership</h2><p>&#8220;You build it, you own it.&#8221;</p><p>This Amazon principle has become a cornerstone of modern cloud engineering, and for good reason. Ownership transforms how you approach every aspect of your work.</p><p>When you truly own a system, you don&#8217;t just write code and throw it over the wall. You:</p><p>&#8226; Design with operational excellence in mind from day one</p><p>&#8226; Implement comprehensive monitoring and alerting</p><p>&#8226; Write runbooks and disaster recovery procedures</p><p>&#8226; Respond to incidents with urgency and accountability</p><p>&#8226; Continuously improve based on lessons learned</p><p>Ownership creates a virtuous cycle: better design leads to fewer incidents, which leads to more time for improvement, which leads to even better systems. The best cloud engineers I&#8217;ve worked with take pride in their systems&#8217; reliability and performance because they know they&#8217;re accountable for the outcomes.</p><h2>5. Become a Polymath</h2><p>The T-shaped engineer concept isn&#8217;t new, but it&#8217;s more critical than ever in cloud engineering. You need deep expertise in at least one area&#8212;whether that&#8217;s serverless architectures, security, networking, or data engineering&#8212;combined with broad knowledge across the entire cloud ecosystem.</p><p>Why? Because modern cloud solutions rarely fit neatly into a single domain. A serverless application still needs proper networking, security controls, database design, and observability. If you only understand Lambda functions but not VPCs, IAM policies, or DynamoDB partition keys, you&#8217;ll build fragile systems with hidden vulnerabilities.</p><p>Broaden your &#8220;T&#8221; by:</p><p>&#8226; Learning adjacent technologies to your specialty</p><p>&#8226; Understanding the fundamentals of compute, storage, networking, and databases</p><p>&#8226; Exploring how different AWS services integrate and complement each other</p><p>&#8226; Studying architecture patterns across different domains</p><p>&#8226; Contributing to projects outside your comfort zone</p><p>The polymath engineer brings versatility that modern cloud environments demand. They can architect end-to-end solutions, spot cross-domain optimization opportunities, and adapt quickly when requirements change.</p><h2>The Complete Cloud Engineer</h2><p>The best cloud engineers I&#8217;ve worked with embody all five traits. They&#8217;re not just technically proficient, they&#8217;re well-rounded professionals who drive real business impact. They ask insightful questions, design holistic solutions, communicate effectively with diverse audiences, take ownership of outcomes, and bring versatile expertise to every challenge.</p><p>As we move into 2026, the gap between engineers who master these traits and those who don&#8217;t will only widen. The good news? Unlike innate talent, these are all learnable skills. Start with one trait, practice deliberately, and build from there.</p><p>The Renaissance Developer that Werner Vogels described isn&#8217;t a distant ideal, it&#8217;s the Cloud Engineer the industry needs right now.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[The Cloud Engineers in 2025 - Year in Review]]></title><description><![CDATA[My passion for helping professionals level up their cloud skills drove me to start this newsletter in early 2025.]]></description><link>https://blog.thecloudengineers.com/p/the-cloud-engineers-in-2025-year</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/the-cloud-engineers-in-2025-year</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 31 Dec 2025 10:30:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/91ff4381-7358-4077-a8dc-6755a634a170_4096x2304.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My passion for helping professionals level up their cloud skills drove me to start this newsletter in early 2025.</p><p>The results after 12 months? 50 posts published, 3,200 subscribers, and over 60,000 views across all content.</p><p>A special thanks to every one of you who subscribed, read, and engaged&#8212;your support means everything.</p><p>In this final article of 2025, I&#8217;m looking back on the year and sharing the top 10 articles based on views and 5 low-view articles worth reading. Make sure to check them out!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Top 10 Most-Viewed Articles</h2><ol><li><p><a href="https://blog.thecloudengineers.com/p/free-pdf-guide-5-aws-projects-to">5 AWS Projects To Get You Hired | FREE PDF Guide</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/how-to-nail-your-cloud-engineering">How to Nail Your Cloud Engineering Interview with the STAR Method</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/from-beginner-to-aws-certified-a">From Beginner to AWS Certified: A Roadmap for the Solutions Architect Associate Exam</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/aws-budgets-for-beginners-how-to">AWS Budgets for Beginners: How to Monitor Your Cloud Spending and Avoid Overages</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/building-your-first-event-driven">Building Your First Event-driven Application on AWS</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/how-much-do-cloud-engineers-really">How Much Do Cloud Engineers Really Make? A Breakdown by Role and Experience</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/idempotency-patterns-serverless-applications">Idempotency Patterns for Serverless Applications</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/secure-file-uploads-to-amazon-s3">Secure File Uploads to Amazon S3 Using Presigned URLs</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/building-a-3-tier-architecture-with">Building a 3-Tier Architecture with Serverless on AWS</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/rto-vs-rpo-designing-practical-dr">RTO vs RPO: Designing Practical DR Strategies on AWS</a></p></li></ol><h2>Low-View Articles Worth Reading</h2><ol><li><p><a href="https://blog.thecloudengineers.com/p/how-to-ace-the-aws-cloud-practitioner">How to Ace the AWS Cloud Practitioner Certification: A Practical Roadmap</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/building-your-personal-brand-in-the">Building Your Personal Brand in the Cloud: Creating an AWS-Powered Resume</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/build-a-personal-blog-with-wordpress">Build a Personal Blog with WordPress on AWS: A Comprehensive Guide</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/why-aws-step-functions-is-my-favourite">Why AWS Step Functions is My Favourite Orchestration Service for Microservices</a></p></li><li><p><a href="https://blog.thecloudengineers.com/p/creating-aws-architecture-diagrams">Creating AWS Architecture Diagrams with Amazon Q CLI and Exporting to Draw.io</a></p></li></ol><h2>Conclusion</h2><p>Wishing you all a very productive 2026 filled with growth and new opportunities. My goal for the coming year is simple: write more, share more, and build a thriving community of cloud engineers. </p><p>Let's level up together!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[5 Lambda Cost Mistakes I See Teams Make (And How to Fix Them)]]></title><description><![CDATA[Lambda&#8217;s pay-per-use model sounds simple: you pay for what you use.]]></description><link>https://blog.thecloudengineers.com/p/5-lambda-cost-mistakes-i-see-teams</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/5-lambda-cost-mistakes-i-see-teams</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 24 Dec 2025 10:30:34 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/eb14a261-e121-4d41-b004-8c1d1950031b_6000x4000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Lambda&#8217;s pay-per-use model sounds simple: you pay for what you use. But after reviewing dozens of AWS bills with teams, I&#8217;ve found that &#8220;what you use&#8221; is often far more than &#8220;what you need.&#8221;</p><p>Here are the five most expensive mistakes I see repeatedly&#8212;and how to fix them.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Mistake #1: Using the Default 128MB Memory</h2><p>This is the most common mistake, and it&#8217;s counterintuitive: <strong>allocating more memory can actually lower your bill.</strong></p><p>Many teams deploy Lambda functions with the default 128MB memory allocation, thinking they&#8217;re being cost-conscious. The problem is that Lambda allocates CPU power proportionally to memory. At 128MB, your function gets a tiny slice of CPU, which means slower execution times.</p><p>Lambda pricing is calculated by multiplying the memory allocated by the execution time by the price per GB-second. If doubling your memory cuts execution time by more than half, you save money.</p><p><strong>The Fix</strong></p><p>Use AWS Lambda Power Tuning, an open-source tool that tests your function across memory configurations and shows you the cost-performance tradeoff. You can deploy it directly from the AWS Serverless Application Repository by searching for &#8220;aws-lambda-power-tuning.&#8221;</p><p>Run it against your production functions quarterly. Memory requirements change as your code and data evolve.</p><h2>Mistake #2: Synchronous Chains of Lambda Functions</h2><p>I&#8217;ve seen architectures where Lambda A calls Lambda B, which calls Lambda C, all synchronously. While the request waterfall completes, you&#8217;re paying for all three functions simultaneously.</p><p>Picture this flow: API Gateway triggers Lambda A, which invokes Lambda B and waits three seconds for a response, which in turn invokes Lambda C and waits two seconds, which finally does one second of actual work. The total billed Lambda execution time is ten seconds, but the actual compute work being done might only be two seconds. You&#8217;re paying for wait time, not compute time.</p><p><strong>The Fix</strong></p><p>Break synchronous chains with asynchronous patterns using SQS, SNS, or EventBridge.</p><p>Instead of having Lambda A invoke Lambda B directly and wait for a response, have Lambda A send a message to an SQS queue and return immediately to the caller with a 202 Accepted status. Lambda B then processes messages from the queue independently.</p><p>This approach means each function only runs for the time it needs to do its own work. No function sits idle waiting for another.</p><p>If you need orchestration with waiting, branching logic, or error handling across multiple steps, use Step Functions. You pay for state transitions rather than idle Lambda time, and you get built-in retry logic and visualization of your workflows.</p><h2>Mistake #3: Logging Everything to CloudWatch</h2><p>CloudWatch Logs ingestion costs $0.50 per GB. That sounds cheap until you realize how much your Lambdas are logging.</p><p>I audited one team&#8217;s account and found a single function generating 50GB of logs per month&#8212;$25/month for one function&#8217;s logs. They were logging full request and response bodies for debugging purposes during development and forgot to remove it before going to production.</p><p><strong>The Fix</strong></p><p><strong>Use log levels properly.</strong> Configure your log level via environment variables so you can set DEBUG in development and INFO or WARN in production without changing code. Only log full payloads at the DEBUG level so they don&#8217;t appear in production logs.</p><p><strong>Set CloudWatch Logs retention policies.</strong> The default retention is &#8220;never expire,&#8221; which means you&#8217;re storing and paying for logs forever. In your infrastructure templates, set retention to 14 or 30 days for most functions. You can always set longer retention for audit-critical functions.</p><p><strong>Sample logs in high-throughput functions.</strong> If a function processes thousands of requests per minute, you don&#8217;t need detailed logs for every single one. Implement sampling where you only log full details for 10% of requests. This gives you enough data to debug issues while cutting log volume by 90%.</p><p><strong>Audit what you&#8217;re logging.</strong> Search your codebase for log statements that include full event objects, request bodies, or response payloads. Each of these can add kilobytes per invocation that compound quickly at scale.</p><h2>Mistake #4: Using Lambda for Long-Running Workloads</h2><p>Lambda charges per millisecond of execution. For quick, bursty workloads, it&#8217;s incredibly cost-effective. For long-running processes, it becomes expensive fast.</p><p>I worked with a team running data transformation jobs that took 10-14 minutes per execution. They hit Lambda&#8217;s 15-minute timeout regularly and were paying a premium for what was essentially a batch job.</p><p><strong>The Math</strong></p><p>For a job running 12 minutes at 2048MB memory, executed 100 times per day:</p><p><strong>Lambda cost:</strong> approximately $72 per month</p><p><strong>Fargate cost</strong> (equivalently sized, running only when needed): approximately $5.40 per month</p><p>That&#8217;s a 13x difference.</p><p><strong>The Fix</strong></p><p>For workloads that exceed a one-minute runtime, consider evaluating AWS Fargate.</p><p>For workloads that can be parallelized, consider breaking them into smaller Lambda functions coordinated by Step Functions. You might process data faster and cheaper than a single long-running container by fanning out work across many concurrent short-lived functions.</p><h2>Mistake #5: Provisioned Concurrency &#8220;Just in Case&#8221;</h2><p>Provisioned Concurrency eliminates cold starts by keeping initialized execution environments ready. It&#8217;s great for latency-sensitive workloads. It&#8217;s terrible for your bill if misconfigured.</p><p>Provisioned Concurrency charges you whether the capacity is used or not. I&#8217;ve seen teams provision 100 concurrent executions &#8220;to be safe&#8221; for functions that peak at 20.</p><p>At 1024MB memory, keeping 100 provisioned instances running 24/7 for a month costs approximately <strong>$1,203</strong>&#8212;for capacity that sits mostly idle.</p><p><strong>The Fix</strong></p><p>Check your actual concurrency patterns in CloudWatch before deciding on provisioned concurrency. Look at the ConcurrentExecutions metric with the Maximum statistic over the past 30 days. You might find your function rarely exceeds 10 concurrent executions, making 100 provisioned instances wildly excessive.</p><h2>Conclusion</h2><p>Lambda cost optimization isn&#8217;t about using less serverless&#8212;it&#8217;s about using it more intelligently. The teams I see with the lowest bills per transaction aren&#8217;t the ones avoiding Lambda. They&#8217;re the ones who understand the pricing model and architect accordingly.</p><p>Start with Mistake #1. Run Lambda Power Tuning on your most-invoked functions this week. It takes 10 minutes and often pays for itself immediately.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[How Much Do Cloud Engineers Really Make? A Breakdown by Role and Experience]]></title><description><![CDATA[Cloud computing has revolutionized how businesses operate, and with it, the demand for skilled cloud professionals has skyrocketed.]]></description><link>https://blog.thecloudengineers.com/p/how-much-do-cloud-engineers-really</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/how-much-do-cloud-engineers-really</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 17 Dec 2025 10:31:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/44f83eaa-cdee-4238-9bdd-24e67fbfd48d_6199x4000.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Cloud computing has revolutionized how businesses operate, and with it, the demand for skilled cloud professionals has skyrocketed. If you&#8217;re considering a career in cloud engineering or looking to understand where you fit in the ecosystem, this comprehensive guide breaks down the various cloud engineering roles and their salary expectations.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Understanding the Cloud Engineering Landscape</h2><p>Before diving into specific roles, it&#8217;s important to note that cloud engineering salaries vary significantly based on several factors:</p><ul><li><p><strong>Location</strong>: San Francisco and New York typically offer 20-40% higher salaries than mid-tier cities</p></li><li><p><strong>Company size</strong>: FAANG and enterprise companies generally pay more than startups</p></li><li><p><strong>Experience level</strong>: Years of experience dramatically impact compensation</p></li><li><p><strong>Certifications</strong>: AWS certifications can add $10,000-$20,000 to base salary</p></li><li><p><strong>Specialization</strong>: Niche skills like security or FinOps command premium rates</p></li></ul><p>All salary ranges mentioned below are for the United States market and include base salary (not total compensation with stock/bonuses), current as of 2025.</p><h2>Entry-Level Roles</h2><h3>1. Cloud Support Engineer</h3><p><strong>Salary Range</strong>: $60,000 - $85,000</p><p>Cloud Support Engineers are typically the first line of technical support for cloud infrastructure issues. They troubleshoot basic cloud service problems, assist with resource provisioning, and escalate complex issues.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Responding to support tickets</p></li><li><p>Basic troubleshooting of EC2, S3, RDS issues</p></li><li><p>Documentation of solutions</p></li><li><p>Monitoring cloud resources</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Basic understanding of AWS services</p></li><li><p>Fundamental networking knowledge</p></li><li><p>Good communication skills</p></li><li><p>AWS Cloud Practitioner certification (recommended)</p></li></ul><h3>2. Junior Cloud Engineer</h3><p><strong>Salary Range</strong>: $70,000 - $95,000</p><p>Junior Cloud Engineers work under senior engineers to build and maintain cloud infrastructure. This is an excellent entry point for those with some technical background.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Assisting with infrastructure deployment</p></li><li><p>Writing basic Infrastructure as Code (IaC)</p></li><li><p>Implementing monitoring and alerts</p></li><li><p>Maintaining documentation</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Basic scripting (Python, Bash)</p></li><li><p>Understanding of Linux/Windows systems</p></li><li><p>Familiarity with CI/CD concepts</p></li><li><p>AWS Solutions Architect Associate certification (preferred)</p></li></ul><h2>Mid-Level Roles</h2><h3>3. Cloud Engineer</h3><p><strong>Salary Range</strong>: $95,000 - $140,000</p><p>The standard Cloud Engineer role involves designing, implementing, and maintaining cloud infrastructure with moderate autonomy.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Designing scalable cloud architectures</p></li><li><p>Implementing IaC with Terraform or CloudFormation</p></li><li><p>Automating deployment pipelines</p></li><li><p>Cost optimization initiatives</p></li><li><p>Security best practices implementation</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>3-5 years of experience</p></li><li><p>Proficiency in IaC tools</p></li><li><p>Strong AWS services knowledge (EC2, VPC, IAM, S3, Lambda, ECS/EKS)</p></li><li><p>Scripting/programming skills</p></li><li><p>CI/CD pipeline experience</p></li></ul><h3>4. DevOps Engineer (Cloud Focus)</h3><p><strong>Salary Range</strong>: $100,000 - $145,000</p><p>DevOps Engineers bridge the gap between development and operations, with heavy focus on automation and CI/CD pipelines in cloud environments.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Building and maintaining CI/CD pipelines</p></li><li><p>Container orchestration (Kubernetes, ECS)</p></li><li><p>Infrastructure automation</p></li><li><p>Application deployment strategies</p></li><li><p>Monitoring and observability implementation</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Strong scripting/programming abilities</p></li><li><p>Container technologies (Docker, Kubernetes)</p></li><li><p>CI/CD tools (Jenkins, GitLab CI, GitHub Actions)</p></li><li><p>Configuration management (Ansible, Chef, Puppet)</p></li><li><p>Cloud platform expertise</p></li></ul><h3>5. Cloud Security Engineer</h3><p><strong>Salary Range</strong>: $110,000 - $155,000</p><p>With security being paramount, Cloud Security Engineers focus specifically on securing cloud infrastructure and ensuring compliance.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Implementing security controls and guardrails</p></li><li><p>Compliance monitoring and remediation</p></li><li><p>Security audits and assessments</p></li><li><p>Incident response</p></li><li><p>Identity and Access Management (IAM) policies</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Deep understanding of cloud security best practices</p></li><li><p>Experience with security tools (GuardDuty, Security Hub, AWS Config)</p></li><li><p>Compliance frameworks (SOC2, HIPAA, PCI-DSS)</p></li><li><p>Security certifications (AWS Security Specialty, CISSP)</p></li></ul><h2>Senior-Level Roles</h2><h3>6. Senior Cloud Engineer</h3><p><strong>Salary Range</strong>: $130,000 - $180,000</p><p>Senior Cloud Engineers are technical leaders who design complex systems and mentor junior team members.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Architecting enterprise-scale solutions</p></li><li><p>Technical leadership and mentoring</p></li><li><p>Establishing cloud best practices</p></li><li><p>Multi-account/multi-region strategies</p></li><li><p>Disaster recovery planning</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>5-8+ years of experience</p></li><li><p>Deep expertise in multiple AWS services</p></li><li><p>Strong architectural knowledge</p></li><li><p>Leadership and communication skills</p></li><li><p>Multiple AWS certifications</p></li></ul><h3>7. Cloud Architect</h3><p><strong>Salary Range</strong>: $140,000 - $200,000</p><p>Cloud Architects design comprehensive cloud solutions aligned with business objectives, focusing on scalability, reliability, and cost-efficiency.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Designing end-to-end cloud solutions</p></li><li><p>Technology evaluation and selection</p></li><li><p>Stakeholder communication</p></li><li><p>Technical documentation and standards</p></li><li><p>Cross-team collaboration</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>7-10+ years of experience</p></li><li><p>Broad and deep AWS knowledge</p></li><li><p>Business acumen</p></li><li><p>Excellent communication skills</p></li><li><p>AWS Solutions Architect Professional certification</p></li></ul><h3>8. Site Reliability Engineer (SRE)</h3><p><strong>Salary Range</strong>: $135,000 - $190,000</p><p>SREs apply software engineering principles to infrastructure and operations problems, focusing on system reliability and performance.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Ensuring system reliability and uptime</p></li><li><p>Implementing SLOs/SLIs/SLAs</p></li><li><p>Incident management and postmortems</p></li><li><p>Performance optimization</p></li><li><p>Capacity planning</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Strong programming skills</p></li><li><p>Deep understanding of distributed systems</p></li><li><p>Observability tools (Prometheus, Grafana, Datadog)</p></li><li><p>On-call experience</p></li><li><p>Automation expertise</p></li></ul><h2>Specialized Roles</h2><h3>9. Cloud FinOps Engineer</h3><p><strong>Salary Range</strong>: $115,000 - $165,000</p><p>A relatively new but increasingly important role, FinOps Engineers focus on cloud cost optimization and financial accountability.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Cloud cost analysis and optimization</p></li><li><p>Implementing tagging strategies</p></li><li><p>Chargeback/showback models</p></li><li><p>Reserved Instance and Savings Plan management</p></li><li><p>Cost anomaly detection and alerting</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Strong analytical skills</p></li><li><p>Deep understanding of AWS pricing models</p></li><li><p>Experience with cost management tools</p></li><li><p>Business communication skills</p></li><li><p>FinOps certification (FinOps Foundation)</p></li></ul><h3>10. Cloud Data Engineer</h3><p><strong>Salary Range</strong>: $120,000 - $175,000</p><p>Cloud Data Engineers build and maintain data infrastructure in the cloud, focusing on data pipelines, warehouses, and analytics platforms.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Designing data architectures</p></li><li><p>Building ETL/ELT pipelines</p></li><li><p>Managing data warehouses (Redshift, Snowflake)</p></li><li><p>Data lake implementation</p></li><li><p>Data governance and quality</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>Strong SQL and Python skills</p></li><li><p>Experience with AWS data services (S3, Glue, Athena, Redshift, Kinesis)</p></li><li><p>Data modeling expertise</p></li><li><p>Big data technologies (Spark, Kafka)</p></li></ul><h2>Leadership Roles</h2><h3>11. Lead Cloud Engineer / Engineering Manager</h3><p><strong>Salary Range</strong>: $150,000 - $210,000</p><p>These roles combine technical expertise with people management, overseeing cloud engineering teams.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Team management and development</p></li><li><p>Technical strategy and roadmap</p></li><li><p>Budget management</p></li><li><p>Cross-functional collaboration</p></li><li><p>Hiring and performance management</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>8-12+ years of experience</p></li><li><p>Proven leadership experience</p></li><li><p>Strong technical background</p></li><li><p>Project management skills</p></li><li><p>Strategic thinking</p></li></ul><h3>12. Principal Cloud Architect</h3><p><strong>Salary Range</strong>: $170,000 - $250,000+</p><p>Principal-level architects are technical authorities who influence technology decisions across entire organizations.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Enterprise-wide architectural decisions</p></li><li><p>Technology vision and strategy</p></li><li><p>Standards and governance</p></li><li><p>Executive stakeholder management</p></li><li><p>Industry thought leadership</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>12-15+ years of experience</p></li><li><p>Exceptional technical depth and breadth</p></li><li><p>Proven track record of large-scale implementations</p></li><li><p>Strong business acumen</p></li><li><p>Excellent communication and influence skills</p></li></ul><h3>13. Director of Cloud Engineering</h3><p><strong>Salary Range</strong>: $180,000 - $280,000+</p><p>Directors oversee multiple teams and are responsible for the overall cloud strategy and execution within an organization.</p><p><strong>Key Responsibilities:</strong></p><ul><li><p>Departmental strategy and vision</p></li><li><p>Budget planning and management</p></li><li><p>Organizational structure and hiring</p></li><li><p>Executive reporting</p></li><li><p>Vendor relationships</p></li></ul><p><strong>Required Skills:</strong></p><ul><li><p>12-15+ years of experience</p></li><li><p>Extensive leadership experience</p></li><li><p>Strategic planning abilities</p></li><li><p>Business and financial acumen</p></li><li><p>Strong executive presence</p></li></ul><h2>Tips for Maximizing Your Cloud Engineering Salary</h2><h3>1. Build a Portfolio</h3><p>Create public repositories showcasing your IaC templates, automation scripts, and projects. Contribute to open-source projects.</p><h3>2. Specialize</h3><p>While breadth is valuable, developing deep expertise in areas like security, FinOps, or Kubernetes can significantly boost your value.</p><h3>3. Stay Current</h3><p>Cloud services evolve rapidly. Regularly learn new services and best practices. Follow AWS announcements and implement new services in side projects.</p><h3>4. Invest in Certifications</h3><p>Start with AWS Solutions Architect Associate and progress to Professional level certifications. They provide both knowledge and market value.</p><h3>5. Develop Soft Skills</h3><p>Technical skills get you in the door, but communication, leadership, and business acumen accelerate your career.</p><h3>6. Network</h3><p>Attend cloud conferences, join cloud communities, engage on LinkedIn, and contribute to technical discussions.</p><h3>7. Negotiate</h3><p>Many candidates leave money on the table by not negotiating. Research market rates and don&#8217;t be afraid to counter-offer.</p><h2>Conclusion</h2><p>Cloud engineering offers excellent compensation across all experience levels, with clear paths for advancement. Whether you&#8217;re just starting out or looking to move into more specialized or leadership roles, understanding the landscape helps you make informed career decisions.</p><p>Remember that salary is just one component of compensation. Consider factors like:</p><ul><li><p>Learning opportunities</p></li><li><p>Work-life balance</p></li><li><p>Company culture</p></li><li><p>Benefits and equity</p></li><li><p>Remote work flexibility</p></li><li><p>Career growth potential</p></li></ul><p>The cloud engineering field is dynamic and rewarding, both intellectually and financially. By continuously learning, building expertise, and developing both technical and soft skills, you can position yourself for a lucrative and fulfilling career in cloud engineering.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>