<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The Cloud Engineers]]></title><description><![CDATA[Level up your Cloud engineering knowledge with this weekly newsletter. Perfect for Cloud Engineers, DevOps, Developers, Architects, and other tech enthusiasts looking to advance their careers while mastering the latest Cloud technologies.]]></description><link>https://blog.thecloudengineers.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Ka2m!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5c6f36c3-e29e-40f9-ab14-7fc0e81d82d9_759x759.png</url><title>The Cloud Engineers</title><link>https://blog.thecloudengineers.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 15 Jun 2026 17:10:04 GMT</lastBuildDate><atom:link href="https://blog.thecloudengineers.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Lefteris Karageorgiou]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thecloudengineers@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thecloudengineers@substack.com]]></itunes:email><itunes:name><![CDATA[Lefteris Karageorgiou]]></itunes:name></itunes:owner><itunes:author><![CDATA[Lefteris Karageorgiou]]></itunes:author><googleplay:owner><![CDATA[thecloudengineers@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thecloudengineers@substack.com]]></googleplay:email><googleplay:author><![CDATA[Lefteris Karageorgiou]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The 4-Step Roadmap to Break Into Cloud (That Actually Works)]]></title><description><![CDATA[Stop Collecting Certifications. Start Building a Career.]]></description><link>https://blog.thecloudengineers.com/p/the-4-step-roadmap-to-break-into</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/the-4-step-roadmap-to-break-into</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 10 Jun 2026 09:31:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2d7b3d9b-f00d-42bc-993d-4b49f6186c82_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, it&#8217;s Lefteris &#128075; I&#8217;m the voice behind the weekly newsletter &#8220;The Cloud Engineers.&#8221;</em></p><p>Most people trying to break into cloud are stuck in an endless loop.</p><p>They watch tutorials. They collect certifications. They spin up a Lambda function, follow along with a YouTube video, tear it down, and call it &#8220;experience.&#8221;</p><p>Then they apply to 50, 100, maybe 200 jobs, and hear nothing back.</p><p>Here&#8217;s the uncomfortable truth: <strong>the market doesn&#8217;t reward what you know. It rewards what you can prove.</strong></p><p>Right now, thousands of professionals who want to switch to cloud careers are competing for the same roles with the same certifications, the same generic resumes, and zero evidence that they can build anything real.</p><p>The ones who break through do something different. They follow a system.</p><p>In this article, we&#8217;ll dive into that system, which consists of four steps.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!01wT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!01wT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 424w, https://substackcdn.com/image/fetch/$s_!01wT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 848w, https://substackcdn.com/image/fetch/$s_!01wT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 1272w, https://substackcdn.com/image/fetch/$s_!01wT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!01wT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png" width="592" height="231.75824175824175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:570,&quot;width&quot;:1456,&quot;resizeWidth&quot;:592,&quot;bytes&quot;:1712928,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/200731176?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!01wT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 424w, https://substackcdn.com/image/fetch/$s_!01wT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 848w, https://substackcdn.com/image/fetch/$s_!01wT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 1272w, https://substackcdn.com/image/fetch/$s_!01wT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f3d5eb3-e864-4615-a17d-5709403bdbc0_1932x756.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><h2>Step 1: Build Real-World Cloud Projects</h2><p>Not toy examples. Not tutorial clones.</p><p>Build production-style projects using industry best practices, the kind you can confidently walk through in an interview without breaking eye contact.</p><p><strong>That means proper architecture. Infrastructure as Code. CI/CD pipelines. Monitoring. Cost awareness. Security considerations. The full picture.</strong></p><p>When an interviewer asks, &#8220;Tell me about something you&#8217;ve built,&#8221; you should have a project so solid that your biggest challenge is deciding which part to talk about first.</p><h2>Step 2: Turn Those Projects Into a Recruiter-Ready Portfolio</h2><p>Building is only half the battle. If nobody can find your work, it doesn&#8217;t exist.</p><p>Instead, showcase your projects properly. Upload clean, well-documented code to GitHub with strong READMEs. Write blog posts explaining your architecture decisions and trade-offs.</p><p><strong>I&#8217;ve seen candidates with mediocre projects outperform stronger builders simply because their work was visible and well presented.</strong></p><p>Also, optimize your LinkedIn profile. Treat it like a landing page, not a digital CV. Use a clear headline that states what you do and who you help. Post consistently about what you&#8217;re building and learning. Recruiters search LinkedIn daily, so make sure they find you.</p><h2>Step 3: Get Visible and Secure Interviews</h2><p>You can&#8217;t get hired from the shadows. Network with intent.</p><p>Engage on LinkedIn, join cloud communities, and attend local meetups.</p><p>But don&#8217;t just lurk. Comment on posts with genuine insights, share your learnings publicly, and connect with people doing the work you want to do.</p><p>I&#8217;ve seen more opportunities come from a single thoughtful comment than from hundreds of cold applications.</p><p>Showcase your projects publicly. Talk about what you&#8217;re building, what broke, and what you learned.</p><p><strong>This isn&#8217;t bragging, it&#8217;s signaling.</strong></p><p>You&#8217;re telling the market: &#8220;I&#8217;m here, I&#8217;m building, and I&#8217;m serious.&#8221;</p><p>The interviews will come. Not from luck, but from visibility.</p><h2>Step 4: Prepare for and Master Interviews</h2><p>Getting the interview isn&#8217;t the finish line. It&#8217;s the starting line.</p><p>Practice explaining your architecture decisions, trade-offs, and problem-solving approach until it becomes second nature.</p><p>Know why you chose DynamoDB over RDS. Know what you&#8217;d change if requirements shifted. Know how you&#8217;d scale under pressure.</p><p><strong>Walk in confident, not guessing.</strong></p><p>The candidates who land offers aren&#8217;t always the most technically brilliant. They&#8217;re the ones who communicate clearly, own their decisions, and show they think like engineers who ship to production.</p><h2>The Problem With Doing This Alone</h2><p>You already know what you need to do. The steps above aren&#8217;t a secret.</p><p><strong>But knowing and executing are two different things.</strong></p><p>Most people get stuck between Step 1 and Step 2 &#8212; building projects that aren&#8217;t strong enough or never making them visible.</p><p>Others network randomly, apply generically, and wonder why nothing lands.</p><p>What separates people who break in from people who stay stuck isn&#8217;t talent.</p><p><strong>It&#8217;s having a structured path, accountability, and someone who&#8217;s done it before showing you exactly where to focus.</strong></p><h2>That&#8217;s Why I Built the Cloud Career Bootcamp</h2><p>Over the past six years, I&#8217;ve personally guided more than 200 professionals through their transition into cloud careers using this system.</p><p>The results have been consistently strong.</p><p>People who felt their backgrounds were holding them back successfully landed cloud roles at companies like AWS, J.P. Morgan, Airbnb, Uber, Pfizer, and more.</p><p>Career changers who thought it was too late to pivot moved into cloud positions.</p><p>Instead of facing constant rejection, they started receiving multiple offers from companies eager to hire them.</p><p>Now, I&#8217;m putting together a program that walks you through this entire roadmap, step by step, with hands-on guidance, real feedback, and a community of people on the same path.</p><p>If you&#8217;re serious about breaking into cloud and you&#8217;re done spinning your wheels, join the waitlist:</p><p>&#128073; <strong><a href="https://learn.thecloudengineers.com/cloud-career-bootcamp-waitlist">Cloud Career Bootcamp &#8212; Join the Free Waitlist Now</a></strong></p><p>Spots will be limited.</p><p>Get on the list so you&#8217;ll be the first to know when doors open.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[S3 Cost Traps: What Nobody Tells You About Lifecycle Policies]]></title><description><![CDATA[You&#8217;re running a production workload on AWS. You set up S3, maybe enabled versioning because &#8220;best practice,&#8221; and moved on. Six months later, your S3 bill is three times what you expected, and you have no idea why.]]></description><link>https://blog.thecloudengineers.com/p/s3-cost-traps-what-nobody-tells-you</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/s3-cost-traps-what-nobody-tells-you</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 03 Jun 2026 09:31:37 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fb484cb2-0341-4b8d-a368-c96f6effd1a7_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, it&#8217;s Lefteris &#128075; I&#8217;m the voice behind the weekly newsletter &#8220;The Cloud Engineers.&#8221;</em></p><p>You&#8217;re running a production workload on AWS. You set up S3, maybe enabled versioning because &#8220;best practice,&#8221; and moved on. Six months later, your S3 bill is three times what you expected, and you have no idea why.</p><p>You&#8217;re not alone. S3 pricing is deceptively simple on the surface, but underneath it hides several cost traps that lifecycle policies are supposed to solve. The problem? Most teams either skip lifecycle rules entirely or configure them in ways that barely help.</p><p>Let&#8217;s talk about what&#8217;s silently eating your budget.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>The Versioning Tax</h2><p>Versioning is great for data protection. Every overwrite or delete keeps the old object around, which means you can recover from accidental changes. What nobody emphasizes: <strong>every old version is a billable object</strong>.</p><p>If you&#8217;re writing logs, reports, or processed data that gets overwritten frequently, you could be storing 10x or 50x the &#8220;visible&#8221; data. The S3 console shows you the current objects. The old versions hide underneath, quietly accumulating storage charges.</p><p><strong>The fix:</strong> Always pair versioning with a lifecycle rule that expires non-current versions. Ask yourself: &#8220;Do I really need 90 days of old versions, or would 7 days cover any realistic recovery scenario?&#8221;</p><h2>The Multipart Upload Graveyard</h2><p>When a large upload fails halfway, the uploaded parts don&#8217;t disappear. They sit in your bucket as incomplete multipart uploads, invisible in the console&#8217;s normal view, but fully billable.</p><p>If you&#8217;re running data pipelines, ETL jobs, or any process that writes large objects and occasionally fails, these orphaned parts add up. Some teams discover gigabytes of phantom storage they never knew existed.</p><p><strong>The fix:</strong> Add a lifecycle rule to abort incomplete multipart uploads after a short window, 7 days is generous for most workloads. Some teams set it to 1 day.</p><h2>The &#8220;I&#8217;ll Transition Everything to Glacier&#8221; Mistake</h2><p>A common first move: create a lifecycle rule that transitions all objects to S3 Glacier after 30 days. Sounds sensible, cold storage is cheap.</p><p>Here&#8217;s what catches people:</p><ul><li><p><strong>Minimum storage duration charges.</strong> Glacier has a 90-day minimum. Delete an object on day 45? You still pay for 90 days.</p></li><li><p><strong>Retrieval costs.</strong> If your application or downstream process ever needs those objects back, retrieval fees and restore times can surprise you.</p></li><li><p><strong>Small object overhead.</strong> Glacier adds 32KB of metadata per object. If you&#8217;re storing millions of tiny files, the overhead alone can exceed what you&#8217;d pay in Standard.</p></li></ul><p><strong>The right question:</strong> Before transitioning, map your actual access patterns. If objects are never accessed after 30 days, Glacier Deep Archive might make sense. If they&#8217;re occasionally accessed, Infrequent Access (IA) with its simpler retrieval model is a safer bet.</p><h2>Lifecycle Rules That Don&#8217;t Actually Apply</h2><p>This one is subtle. You create a lifecycle rule, confirm it&#8217;s active, and assume it&#8217;s working. But lifecycle rules scope by <strong>prefix</strong> and <strong>tags</strong>. If your bucket structure changed after you wrote the rule &#8212; new prefixes, different naming conventions &#8212; your rule might be covering 10% of the bucket while the rest grows unchecked.</p><p><strong>The fix:</strong> Audit lifecycle rules quarterly. Use S3 Storage Lens to see the actual breakdown of storage classes, current vs. non-current versions, and incomplete multipart uploads across your buckets. If the numbers don&#8217;t match your expectations, your rules have gaps.</p><h2>A Mental Model for Getting This Right</h2><p>Think of lifecycle policies as a three-layer system:</p><ol><li><p><strong>Expiration layer</strong>: What can be deleted, and when? Non-current versions, expired delete markers, incomplete uploads.</p></li><li><p><strong>Transition layer</strong>: What should move to cheaper storage, and based on what access pattern evidence?</p></li><li><p><strong>Audit layer</strong>: How do you verify the rules are actually working as intended?</p></li></ol><p>Most teams only think about layer two and skip layers one and three entirely. That&#8217;s where the silent costs hide.</p><h2>The Takeaway</h2><p>S3 is not &#8220;set and forget.&#8221; The defaults are designed for durability, not cost efficiency. If you&#8217;re not actively managing object versions, failed uploads, and storage class transitions with lifecycle rules, and validating those rules still match reality, you&#8217;re overpaying.</p><p>Start with a single bucket. Check its Storage Lens dashboard. You&#8217;ll probably find at least one surprise waiting for you.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[5 Fundamental Architectures You Should Know for Cloud Interviews]]></title><description><![CDATA[If you want to break into a cloud role (Cloud Engineer, Cloud Developer, Software Engineer, DevOps, SRE, Solutions Architect, etc.) and you have an interview coming up, understanding cloud architecture is non-negotiable if you want to stand out in interviews.]]></description><link>https://blog.thecloudengineers.com/p/5-fundamental-architectures-you-should</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/5-fundamental-architectures-you-should</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 27 May 2026 09:31:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2e8b6a0e-d410-4228-a371-7c7b8813f618_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, it&#8217;s Lefteris &#128075; I&#8217;m the voice behind the weekly newsletter &#8220;The Cloud Engineers.&#8221;</em></p><p>If you want to break into a cloud role (Cloud Engineer, Cloud Developer, Software Engineer, DevOps, SRE, Solutions Architect, etc.) and you have an interview coming up, understanding cloud architecture is non-negotiable if you want to stand out in interviews.</p><p>Most candidates focus on certifications or memorising AWS services, but interviews are designed to test something deeper: how you think about building scalable, reliable, and resilient systems.</p><p>In this article, we&#8217;ll look at 5 fundamental architectures you should know. Master them, understand the trade-offs behind them, and you&#8217;ll stand out for the right reasons.</p><p>If you want hands-on practice with these 5 architectures, grab my free PDF, <a href="https://learn.thecloudengineers.com/5-aws-projects-to-get-you-hired">5 AWS Projects To Get You Hired</a>., featuring 5 real-world AWS projects with architecture diagrams and step-by-step implementation guides.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://learn.thecloudengineers.com/5-aws-projects-to-get-you-hired&quot;,&quot;text&quot;:&quot;Download the FREE PDF&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://learn.thecloudengineers.com/5-aws-projects-to-get-you-hired"><span>Download the FREE PDF</span></a></p><h2>1. 3-Tier Architecture</h2><p>This is the foundation. Frontend, backend, database: three distinct layers, each with a clear responsibility. Simple in concept, but interviewers expect you to go far beyond drawing three boxes on a whiteboard.</p><p><strong>Key AWS Services:</strong> CloudFront + S3 (frontend), ALB + EC2/ECS with Auto Scaling (backend), RDS Multi-AZ or Aurora (database).</p><p><strong>What interviewers expect you to know:</strong></p><ul><li><p>How to scale each tier independently: stateless backends behind a load balancer, read replicas for database read-heavy workloads.</p></li><li><p>Failure scenarios: what happens when an AZ goes down? How does Multi-AZ RDS failover work?</p></li><li><p>Caching strategies: ElastiCache between backend and database to reduce latency and DB load.</p></li><li><p>The difference between horizontal and vertical scaling, and why horizontal wins at scale.</p></li></ul><p>When discussing this architecture, show that you understand the <em>why</em> behind each layer&#8217;s separation, not just the <em>what</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BV_L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BV_L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 424w, https://substackcdn.com/image/fetch/$s_!BV_L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 848w, https://substackcdn.com/image/fetch/$s_!BV_L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 1272w, https://substackcdn.com/image/fetch/$s_!BV_L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BV_L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png" width="330" height="192.95681063122925" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ceef382e-4853-4b95-ad3e-be06778106a3_602x352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:352,&quot;width&quot;:602,&quot;resizeWidth&quot;:330,&quot;bytes&quot;:223087,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/198673803?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BV_L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 424w, https://substackcdn.com/image/fetch/$s_!BV_L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 848w, https://substackcdn.com/image/fetch/$s_!BV_L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 1272w, https://substackcdn.com/image/fetch/$s_!BV_L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fceef382e-4853-4b95-ad3e-be06778106a3_602x352.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>2. Microservices</h2><p>Breaking a monolith into smaller, independent services sounds clean on paper. In practice, it introduces a whole new category of challenges.</p><p><strong>Key AWS Services:</strong> ECS or EKS, Lambda.</p><p><strong>What interviewers expect you to know:</strong></p><ul><li><p>Trade-offs vs. monoliths: when the operational overhead isn&#8217;t worth it (small teams, early-stage products).</p></li><li><p>Data ownership: each service owns its data store. No shared databases.</p></li><li><p>How you handle failures across service boundaries: circuit breakers, retries with exponential backoff, timeouts.</p></li><li><p>Observability: how do you trace a request that flows through 5 services?</p></li></ul><p>The key insight interviewers look for: you chose microservices because the problem demanded it, not because it was trendy.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FQNE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FQNE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 424w, https://substackcdn.com/image/fetch/$s_!FQNE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 848w, https://substackcdn.com/image/fetch/$s_!FQNE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 1272w, https://substackcdn.com/image/fetch/$s_!FQNE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FQNE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png" width="350" height="228.5031847133758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:410,&quot;width&quot;:628,&quot;resizeWidth&quot;:350,&quot;bytes&quot;:247069,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/198673803?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FQNE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 424w, https://substackcdn.com/image/fetch/$s_!FQNE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 848w, https://substackcdn.com/image/fetch/$s_!FQNE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 1272w, https://substackcdn.com/image/fetch/$s_!FQNE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ff71116-5051-4e3d-86cb-89a25a512268_628x410.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>3. Serverless</h2><p>No servers to patch, no infrastructure to manage. You can build entire applications without provisioning a single instance.</p><p><strong>Key AWS Services:</strong> Lambda, API Gateway.</p><p><strong>What interviewers expect you to know:</strong></p><ul><li><p>Cold starts: what causes them, how to mitigate</p></li><li><p>Cost model: pay-per-invocation is cheap at low scale but can surprise you with high-throughput workloads. Know the break-even point vs. containers.</p></li><li><p>When NOT to use serverless: long-running processes, workloads needing persistent connections, or latency-critical paths where cold starts are unacceptable.</p></li></ul><p>The interview-winning move is demonstrating that you can evaluate when serverless is the right tool and when it introduces more problems than it solves.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-4AC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-4AC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 424w, https://substackcdn.com/image/fetch/$s_!-4AC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 848w, https://substackcdn.com/image/fetch/$s_!-4AC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 1272w, https://substackcdn.com/image/fetch/$s_!-4AC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-4AC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png" width="350" height="125.95541401273886" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:226,&quot;width&quot;:628,&quot;resizeWidth&quot;:350,&quot;bytes&quot;:134361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/198673803?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-4AC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 424w, https://substackcdn.com/image/fetch/$s_!-4AC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 848w, https://substackcdn.com/image/fetch/$s_!-4AC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 1272w, https://substackcdn.com/image/fetch/$s_!-4AC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F186b18d9-c743-40ac-b975-5e5ca054a59c_628x226.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>4. Event-Driven Architecture</h2><p>This is where modern cloud design becomes powerful. Instead of services calling each other directly, they produce and consume events asynchronously. The result: loose coupling, high scalability, and systems that are naturally resilient to spikes in traffic.</p><p><strong>Key AWS Services:</strong> EventBridge, SNS, SQS.</p><p><strong>What interviewers expect you to know:</strong></p><ul><li><p>The difference between queuing (SQS one consumer) and pub/sub (SNS fan-out to many consumers).</p></li><li><p>Eventual consistency: data won&#8217;t be immediately up to date across services. You need to explain how your system handles this.</p></li><li><p>Idempotency: events can be delivered more than once. Your consumers must handle duplicates gracefully.</p></li><li><p>Dead-letter queues: what happens when an event fails processing repeatedly? How do you monitor and replay?</p></li><li><p>A real scenario: an order is placed &#8594; event fires &#8594; inventory, notifications, and analytics services react independently without knowing about each other.</p></li></ul><p>The challenge is showing you can design systems where components don&#8217;t need to know about each other but still behave correctly as a whole.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nD2p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nD2p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 424w, https://substackcdn.com/image/fetch/$s_!nD2p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 848w, https://substackcdn.com/image/fetch/$s_!nD2p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 1272w, https://substackcdn.com/image/fetch/$s_!nD2p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nD2p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png" width="350" height="163.33333333333334" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:336,&quot;width&quot;:720,&quot;resizeWidth&quot;:350,&quot;bytes&quot;:214449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/198673803?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nD2p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 424w, https://substackcdn.com/image/fetch/$s_!nD2p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 848w, https://substackcdn.com/image/fetch/$s_!nD2p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 1272w, https://substackcdn.com/image/fetch/$s_!nD2p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b3eba8-4ac4-47ac-81dd-91ecd984a14a_720x336.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>5. Containers</h2><p>Containers give you consistency across environments and fine-grained control over your runtime. They sit between the full control of EC2 and the abstraction of serverless.</p><p><strong>Key AWS Services:</strong> ECS, EKS, ECR.</p><p><strong>What interviewers expect you to know:</strong></p><ul><li><p>When containers beat serverless: long-running processes, specific runtime needs, workloads needing persistent connections, or apps being migrated from on-premises.</p></li><li><p>When serverless beats containers: short-lived, event-triggered functions with variable traffic.</p></li><li><p>Health checks, rolling deployments, and blue/green strategies for zero-downtime updates.</p></li><li><p>You don&#8217;t need to be a Kubernetes expert. But you should understand pods, services, and why teams choose (or avoid) K8s.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aAsV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aAsV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 424w, https://substackcdn.com/image/fetch/$s_!aAsV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 848w, https://substackcdn.com/image/fetch/$s_!aAsV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 1272w, https://substackcdn.com/image/fetch/$s_!aAsV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aAsV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png" width="350" height="129.34782608695653" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:272,&quot;width&quot;:736,&quot;resizeWidth&quot;:350,&quot;bytes&quot;:197437,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/198673803?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aAsV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 424w, https://substackcdn.com/image/fetch/$s_!aAsV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 848w, https://substackcdn.com/image/fetch/$s_!aAsV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 1272w, https://substackcdn.com/image/fetch/$s_!aAsV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d1f25da-905b-45e9-8b73-13ce5fd50988_736x272.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h2>Conclusion</h2><p>These five architectures cover the vast majority of what you&#8217;ll face in cloud interviews. You don&#8217;t need to memorise every AWS service. You need to understand patterns, trade-offs, and when to apply each one.</p><p>The candidates who stand out are the ones who can explain <em>why</em> they chose an architecture, not just draw it on a board. Show your reasoning. Discuss trade-offs. Acknowledge limitations.</p><p>Cloud interviews are rarely about knowing the &#8220;correct&#8221; AWS service. They&#8217;re about proving you can make good engineering decisions under constraints.</p><p>If you want hands-on practice with these 5 architectures, grab my free PDF, <a href="https://learn.thecloudengineers.com/5-aws-projects-to-get-you-hired">5 AWS Projects To Get You Hired</a>., featuring 5 real-world AWS projects with architecture diagrams and step-by-step implementation guides.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://learn.thecloudengineers.com/5-aws-projects-to-get-you-hired&quot;,&quot;text&quot;:&quot;Download the FREE PDF&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://learn.thecloudengineers.com/5-aws-projects-to-get-you-hired"><span>Download the FREE PDF</span></a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Building Multi-Agent Workflows on Lambda Durable Functions]]></title><description><![CDATA[Multi-agent systems are everywhere now. Every team wants autonomous agents collaborating on complex tasks, research, planning, execution, validation.]]></description><link>https://blog.thecloudengineers.com/p/building-multi-agent-workflows-on</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/building-multi-agent-workflows-on</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 20 May 2026 09:30:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/cd5cd993-54d8-4858-ba63-b7c6f0c324d6_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, it&#8217;s Lefteris &#128075; I&#8217;m the voice behind the weekly newsletter &#8220;The Cloud Engineers.&#8221;</em></p><p>Multi-agent systems are everywhere now. Every team wants autonomous agents collaborating on complex tasks, research, planning, execution, validation. The problem? Orchestrating multiple agents that need to wait on each other, handle failures gracefully, and maintain state across long-running conversations is painful. We&#8217;ve been duct-taping this together with Step Functions, SQS queues, and DynamoDB state tables for too long.</p><p>Lambda Durable Functions change the game here. In this article we will walk through why durable functions are a natural fit for multi-agent orchestration, design a document processing pipeline with four agents, and show how the architecture holds together without a single line of state management code.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why Durable Functions for Multi-Agent</h2><p>Standard Lambda functions run start-to-finish in a single invocation. If something fails midway, you retry everything. For a multi-agent workflow where Agent A researches, Agent B plans, Agent C executes, and Agent D validates. That&#8217;s unacceptable. You can&#8217;t re-run a 4-minute research phase because the validation agent hit a transient error.</p><p>Durable functions automatically checkpoint progress, suspend execution for up to one year during long-running tasks, and recover from failures. No custom state management. No DynamoDB tables tracking &#8220;which step are we on.&#8221; The runtime handles it.</p><p>This is exactly what multi-agent orchestration needs: reliable progress tracking across agents that may take seconds or minutes to respond, with automatic recovery when things go wrong.</p><h2>The Example: Document Processing Pipeline</h2><p>Let&#8217;s see an example of a document processing pipeline and how we can build it with Lambda durable functions. We have four agents and one durable function orchestrating them. The agents are:</p><ul><li><p><strong>Classifier Agent</strong> &#8212; Reads an incoming document, determines its type (invoice, contract, support ticket), and extracts metadata.</p></li><li><p><strong>Enrichment Agent</strong> &#8212; Takes the classification, pulls additional context from internal systems, and augments the document with business context.</p></li><li><p><strong>Decision Agent</strong> &#8212; Evaluates the enriched document against business rules and decides the routing: auto-approve, escalate, or reject.</p></li><li><p><strong>Action Agent</strong> &#8212; Executes the decision: files the document, notifies stakeholders, or triggers downstream workflows.</p></li></ul><h2>Architecture</h2><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                     Lambda Durable Function (Orchestrator)                  &#9474;
&#9474;                                                                             &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Checkpoint 1 &#9474;    &#9474; Checkpoint 2 &#9474;    &#9474; Checkpoint 3 &#9474;    &#9474; Checkpoint 4 &#9474;  
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;         &#9474;                  &#9474;                  &#9474;                  &#9474;          &#9474;
&#9474;         &#9660;                  &#9660;                  &#9660;                  &#9660;          &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474;  Step 1:    &#9474;    &#9474;  Step 2:    &#9474;    &#9474;  Step 3:    &#9474;    &#9474;  Step 4:    &#9474;   &#9474;
&#9474;  &#9474;  Classify   &#9474;&#9472;&#9472;&#9472;&#9654;&#9474;  Enrich     &#9474;&#9472;&#9472;&#9472;&#9654;&#9474;  Decide     &#9474;&#9472;&#9472;&#9472;&#9654;&#9474;  Act        &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;         &#9474;                  &#9474;                  &#9474;                  &#9474;          &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
          &#9474;                  &#9474;                  &#9474;                  &#9474;
          &#9660;                  &#9660;                  &#9660;                  &#9660;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;  Classifier     &#9474;  &#9474;  Enrichment     &#9474;  &#9474;  Decision       &#9474;  &#9474;  Action         &#9474;
&#9474;  Agent          &#9474;  &#9474;  Agent          &#9474;  &#9474;  Agent          &#9474;  &#9474;  Agent          &#9474;
&#9474;  (Lambda)       &#9474;  &#9474;  (Lambda)       &#9474;  &#9474;  (Lambda)       &#9474;  &#9474;  (Lambda)       &#9474;
&#9474;                 &#9474;  &#9474;                 &#9474;  &#9474;                 &#9474;  &#9474;                 &#9474;
&#9474;  - Reads doc    &#9474;  &#9474;  - Pulls context&#9474;  &#9474;  - Evaluates    &#9474;  &#9474;  - Files doc    &#9474;
&#9474;  - Classifies   &#9474;  &#9474;  - Calls APIs   &#9474;  &#9474;    rules        &#9474;  &#9474;  - Notifies     &#9474;
&#9474;  - Extracts     &#9474;  &#9474;  - Augments     &#9474;  &#9474;  - Routes       &#9474;  &#9474;  - Triggers     &#9474;
&#9474;    metadata     &#9474;  &#9474;    metadata     &#9474;  &#9474;                 &#9474;  &#9474;    downstream   &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
         &#9474;                    &#9474;                    &#9474;                     &#9474;
         &#9660;                    &#9660;                    &#9660;                     &#9660;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;  Amazon Bedrock &#9474;  &#9474;  Internal APIs  &#9474;  &#9474;  Human Review   &#9474;  &#9474;  S3 / SNS /     &#9474;
&#9474;  (LLM)          &#9474;  &#9474;  (DynamoDB,     &#9474;  &#9474;  (Callback -    &#9474;  &#9474;  EventBridge    &#9474;
&#9474;                 &#9474;  &#9474;   other svcs)   &#9474;  &#9474;   Wait/Resume)  &#9474;  &#9474;                 &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
</code></code></pre><h2>The Flow</h2><p>The durable function receives a document event. It invokes the Classifier Agent as a durable step and progress is checkpointed. If Lambda recycles the execution environment after classification completes, the function resumes from that checkpoint, not from scratch.</p><p>The Classifier&#8217;s output feeds into the Enrichment Agent. Another durable step, another checkpoint. The Enrichment Agent might call external APIs that take 30 seconds. The durable function suspends, pays nothing while waiting, and resumes when enrichment completes.</p><p>Here&#8217;s where it gets interesting. The Decision Agent might determine that a human needs to review this document. The durable function uses a wait where it suspends execution entirely, for hours or days if needed, until a callback arrives with the human&#8217;s decision. No polling. No idle compute. The function simply resumes where it left off.</p><p>Finally, the Action Agent executes. If it fails, maybe a downstream system is temporarily unavailable, the durable function retries that specific step without re-running classification, enrichment, or decision. Four agents, one orchestration function, zero state management infrastructure.</p><h2>Why This Beats the Alternative</h2><p>Before durable functions, this same workflow required: a Step Functions state machine, DynamoDB for intermediate state, SQS queues between agents, dead-letter queues for failures, and CloudWatch alarms for stuck executions. That&#8217;s five services to manage for what is conceptually a single workflow.</p><p>With durable functions, it&#8217;s one Lambda function. Same programming model you already know. Same event handler. Same integrations. The durability is built into the execution model itself.</p><h2>The Trade-Off</h2><p>Durable functions use a checkpoint-replay model. Every time execution resumes, it replays from the last checkpoint. This means your orchestration logic must be deterministic, so no random values, no reading the current time for branching decisions outside of durable steps. This is a constraint worth understanding upfront.</p><p>For multi-agent workflows specifically, this is rarely a problem. Your orchestration logic is typically: call agent, get result, pass to next agent. That&#8217;s inherently deterministic.</p><h2>When to Reach for This</h2><p>Multi-agent workflows that involve waiting on humans, on slow external systems, on other agents, are the sweet spot. If your agents all respond in under a second and never fail, you probably don&#8217;t need durability. But that&#8217;s not the real world.</p><p>In the real world, agents call LLMs that timeout, external APIs that rate-limit, and humans that go to lunch. Durable functions handle all of that without you writing a single line of state management logic.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[How to Get the Most Out of Tech Events]]></title><description><![CDATA[I&#8217;ve attended a couple of tech events over the past two weeks, and they reminded me just how much value is sitting right there, if you know how to look for it. Most people show up, sit through sessions, and leave. But the real ROI of a tech event goes well beyond the agenda.]]></description><link>https://blog.thecloudengineers.com/p/how-to-get-the-most-out-of-tech-events</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/how-to-get-the-most-out-of-tech-events</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 13 May 2026 09:31:02 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ffed0439-563d-4f70-8bcc-dc7b617a49b5_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, it&#8217;s Lefteris &#128075; I&#8217;m the voice behind the weekly newsletter &#8220;The Cloud Engineers.&#8221;</em></p><p>I&#8217;ve attended a couple of tech events over the past two weeks, and they reminded me just how much value is sitting right there, if you know how to look for it. Most people show up, sit through sessions, and leave. But the real ROI of a tech event goes well beyond the agenda.</p><p>Here&#8217;s what I&#8217;ve learned about making the most of them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_hO2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_hO2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 424w, https://substackcdn.com/image/fetch/$s_!_hO2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 848w, https://substackcdn.com/image/fetch/$s_!_hO2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 1272w, https://substackcdn.com/image/fetch/$s_!_hO2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_hO2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png" width="1456" height="1095" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1095,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4094003,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/195838348?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_hO2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 424w, https://substackcdn.com/image/fetch/$s_!_hO2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 848w, https://substackcdn.com/image/fetch/$s_!_hO2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 1272w, https://substackcdn.com/image/fetch/$s_!_hO2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc482e42c-d6c6-438b-b624-7130606d3ffa_1952x1468.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>1. Do Your Research Beforehand</h2><p>Walking into an event blind is a missed opportunity. Before you even step through the door, check the agenda. Which sessions are actually worth your time? Which workshops align with what you&#8217;re working on right now? Who&#8217;s speaking, and more importantly, who&#8217;s attending?</p><p>Identify two or three people you&#8217;d genuinely like to connect with. Not just big names, but people whose work you follow, whose problems overlap with yours, or whose perspective you&#8217;d find valuable. Having that list in your head changes how you move through the event. You stop wandering and start being intentional.</p><h2>2. Don&#8217;t Just Attend &#8212; Connect</h2><p>This is where most people leave value on the table. They attend the sessions, they clap at the end, and they head to the next room. But the conversations that happen in the hallways, at the coffee station, or right after a talk? Those are often worth more than the talk itself.</p><p>Talk to speakers. Ask them a follow-up question. Share your perspective on something they said. Disagree, even &#8212; respectfully. The best conversations I&#8217;ve had at events started with &#8220;I see it slightly differently, here&#8217;s why.&#8221;</p><p>Networking isn&#8217;t about collecting business cards or LinkedIn connections. It&#8217;s about meaningful exchange. One real conversation with the right person can open a door that no session ever could.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aAeh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aAeh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 424w, https://substackcdn.com/image/fetch/$s_!aAeh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 848w, https://substackcdn.com/image/fetch/$s_!aAeh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 1272w, https://substackcdn.com/image/fetch/$s_!aAeh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aAeh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png" width="1456" height="1095" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1095,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4505185,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/195838348?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aAeh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 424w, https://substackcdn.com/image/fetch/$s_!aAeh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 848w, https://substackcdn.com/image/fetch/$s_!aAeh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 1272w, https://substackcdn.com/image/fetch/$s_!aAeh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea475762-9b7f-47a4-8f8f-bbb294cb6c5b_1952x1468.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>3. Participate Actively</h2><p>Side events, workshops, roundtables, and hackathons are where the real engagement happens. The more you put in, the more you get out. It sounds obvious, but most people default to passive attendance.</p><p>When you participate actively, you become memorable. People remember the person who asked the sharp question, who contributed to the workshop discussion, who showed up to the evening side event when everyone else went back to the hotel. That presence compounds over time.</p><h2>4. Capture and Apply Insights</h2><p>Take notes. Not on everything, but on what genuinely stands out. A new mental model. A tool you hadn&#8217;t heard of. A framing of a problem that clicked. A name someone mentioned three times.</p><p>The real value isn&#8217;t in the notes themselves. It&#8217;s in what you do with them afterward. Block time in the week after the event to review what you captured and identify one or two things you can actually apply. That&#8217;s the difference between an event that felt good and one that moved the needle.</p><h2>The Right Measure of Success</h2><p>Here&#8217;s a reframe that changed how I approach events: you don&#8217;t need to walk away with ten new contacts, a notebook full of insights, and a job offer to call it a success.</p><p><strong>If you leave with one valuable new connection, one useful takeaway, or simply a great experience that reminded you why you&#8217;re in this field &#8212; that&#8217;s a win</strong>. Set that bar, and you&#8217;ll almost always clear it.</p><p>The mistake is treating events as passive consumption. The sessions are the structure, but the value is in how you engage with everything around them. Come prepared, stay curious, and be willing to start a conversation.</p><p>That&#8217;s where the real return is.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Spec-Driven Development on AWS]]></title><description><![CDATA[We&#8217;ve all been there. A service gets deployed, an API route goes live, and three weeks later someone asks, &#8220;Wait, what does this endpoint actually do?&#8221;]]></description><link>https://blog.thecloudengineers.com/p/spec-driven-development-on-aws</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/spec-driven-development-on-aws</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 06 May 2026 09:31:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/212c770b-a9f5-4658-a2ad-b9f28602fd65_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Hey, it&#8217;s Lefteris &#128075; I&#8217;m the voice behind the weekly newsletter &#8220;The Cloud Engineers.&#8221;</em></p><p>We&#8217;ve all been there. A service gets deployed, an API route goes live, and three weeks later someone asks, <em>&#8220;Wait, what does this endpoint actually do?&#8221;</em> The answer lives in someone&#8217;s head, a Slack thread, or a Confluence page nobody has updated since the initial design meeting. That&#8217;s the problem <strong>Spec-Driven Development (SDD)</strong> solves, and on AWS, it changes how you build, test, and evolve systems regardless of whether you&#8217;re running Lambda, ECS, EKS, or EC2.</p><p>This hands-on <strong><a href="https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris">Spec-Driven Development workshop</a></strong> is built for that exact problem. Instead of watching demos, you&#8217;ll build a real application while learning how to define clear specs and guide AI to produce reliable, production-ready outputs.</p><p>After a sold-out first cohort, Cohort 2 is now open.</p><p><strong>&#128073; Register here:</strong> <a href="https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris">https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris</a></p><p>Use code <strong>SD40</strong> for 40% off</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WjTi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WjTi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!WjTi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!WjTi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!WjTi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WjTi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png" width="1280" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:320866,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/195966748?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WjTi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!WjTi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!WjTi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!WjTi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa81f9755-3206-49cb-b25a-090fa001ea0c_1280x640.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris&quot;,&quot;text&quot;:&quot;Spec-Driven Workshop at 40% off&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris"><span>Spec-Driven Workshop at 40% off</span></a></p><h2>What Is Spec-Driven Development?</h2><p>Spec-Driven Development is a practice where the contract, also know as specification, is written before any implementation begins. The spec becomes the single source of truth. Everything else, the application logic, the infrastructure configuration, the event schemas, derives from it.</p><p>This isn&#8217;t documentation-first development in the old sense. It&#8217;s not about writing a Word document before you code. It&#8217;s about defining the shape of your system - inputs, outputs, events, errors - in a machine-readable format that your tooling, your tests, and your team can all reason about simultaneously.</p><h2>Why It Matters Across AWS Architectures</h2><p>The need for specs is a distributed systems problem, and AWS workloads are distributed by nature, whether you&#8217;re running microservices on ECS Fargate, a data pipeline on EMR, a real-time processing layer on Kinesis, or a containerized platform on EKS.</p><p>Every boundary between components is a contract. A service on ECS calling another service over HTTP, a Kafka consumer reading from MSK, an EKS pod publishing to an SQS queue, each of these is an implicit agreement about shape, behavior, and failure modes. Without a spec, that agreement is undocumented and fragile.</p><p>We&#8217;ve seen this break in predictable ways. A team running microservices on ECS changes the response schema of an internal API. The downstream service starts returning 500s. No contract test caught it. No spec was ever written. The failure surfaces in production, during peak traffic, after a deployment that passed all unit tests. A spec would have made that breaking change visible before it shipped.</p><p>The same pattern plays out in data engineering. A Glue job changes the shape of a Parquet file it writes to S3. The Athena queries downstream start failing. The schema was never formally defined, it was inferred from the data. An explicit schema contract, enforced at write time, would have caught the drift immediately.</p><h2>Where Kiro Changes the Game</h2><p>This is where AWS&#8217;s agentic IDE, <a href="https://kiro.dev/">Kiro</a>, becomes directly relevant to how we practice SDD.</p><p>Kiro inverts the model most AI coding tools use. Kiro starts with the spec. When you describe a feature or a system in natural language, Kiro doesn&#8217;t generate code. It generates a structured specification first. That spec lives in three artifacts: <code>requirements.md</code>, <code>design.md</code>, and <code>tasks.md</code>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gfi2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gfi2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 424w, https://substackcdn.com/image/fetch/$s_!gfi2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 848w, https://substackcdn.com/image/fetch/$s_!gfi2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 1272w, https://substackcdn.com/image/fetch/$s_!gfi2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gfi2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gfi2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 424w, https://substackcdn.com/image/fetch/$s_!gfi2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 848w, https://substackcdn.com/image/fetch/$s_!gfi2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 1272w, https://substackcdn.com/image/fetch/$s_!gfi2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53172af3-9f76-4998-a810-c59b721dc379_1738x1160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong>requirements file</strong> expands your prompt into user stories with acceptance criteria written in EARS notation, a structured format that captures preconditions, triggers, and expected system responses, including edge cases that would otherwise surface during implementation. The <strong>design file</strong> produces a technical design document covering architecture decisions and sequence diagrams. The <strong>tasks file</strong> breaks the design into discrete, sequenced implementation steps with dependency tracking. Only after you review and approve those artifacts does Kiro begin writing code.</p><p>This is spec-driven development operationalized inside your IDE. The spec isn&#8217;t a side artifact you produce reluctantly. It&#8217;s the primary artifact the entire workflow is built around. Code becomes the build output of the spec, not the other way around.</p><p>For teams building on AWS, this matters because it forces the contract conversation to happen before a single line of application or infrastructure code is written. Whether you&#8217;re defining an API contract between two ECS services, an event schema for an EventBridge rule, or the interface between a CDK construct and the team consuming it, all of it gets defined and reviewed at the spec level, not discovered during a production incident.</p><h2>The Spec as the Center of Gravity</h2><p>The shift SDD requires is treating the spec as the artifact that drives everything else, not as something you generate after the fact.</p><p>When we start a new API, the OpenAPI spec is written first. The service is built to satisfy it. The tests validate against it. When a consumer team needs to integrate, we hand them the spec, not a Slack message. The same principle applies to infrastructure. Before a new CDK construct is published for internal use, its interface contract is defined and reviewed. Breaking changes are explicit. They show up in the diff before they show up in a broken pipeline.</p><p>This approach forces clarity early. You can&#8217;t write a spec for something you haven&#8217;t thought through. The act of speccing a service or an event forces you to answer questions you&#8217;d otherwise defer: What are the required fields? What are the error states? What does a partial failure look like? What&#8217;s the versioning strategy?</p><h2>The Discipline It Demands</h2><p>Adopting SDD, with or without Kiro, requires a workflow shift.</p><ul><li><p><strong>Design reviews happen at the spec level.</strong> Before any service is built, the team reviews the OpenAPI, AsyncAPI, or infrastructure spec. This is where architectural decisions get made, not in code review.</p></li><li><p><strong>Mocking becomes trivial.</strong> A spec-first API can be mocked immediately. Consumer teams don&#8217;t wait for implementation. Parallel development becomes the default.</p></li><li><p><strong>Breaking changes become visible.</strong> When the spec is versioned and diffed, breaking changes are explicit. A field removal or type change shows up in the diff before it shows up in a production incident.</p></li><li><p><strong>Onboarding accelerates.</strong> A new engineer joining the team can understand the system&#8217;s boundaries by reading the specs. The spec is the architecture, expressed precisely.</p></li></ul><p>The temptation is always to skip the spec and start building, especially under deadline pressure. But the cost of that shortcut compounds. Every undocumented contract is technical debt that accrues interest in the form of production incidents, integration failures, and onboarding friction. Kiro makes that discipline easier to maintain by making the spec the default starting point, not an afterthought.</p><p>Write the spec first. Build to it. Everything else follows.</p><div><hr></div><p>AI can get you to the first feature fast, but most developers struggle when the system starts to grow.</p><p>This hands-on <strong>Spec-Driven Development workshop</strong> is built for that exact problem. Instead of watching demos, you&#8217;ll build a real application while learning how to define clear specs and guide AI to produce reliable, production-ready outputs.</p><p>After a sold-out first cohort, Cohort 2 is now open.</p><p>Register here: <a href="https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris">https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris</a></p><p>Use code <strong>SDD40</strong> for 40% off</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris&quot;,&quot;text&quot;:&quot;Spec-Driven Workshop at 40% off&quot;,&quot;action&quot;:null,&quot;class&quot;:&quot;button-wrapper&quot;}" data-component-name="ButtonCreateButton"><a class="button primary button-wrapper" href="https://www.eventbrite.co.uk/e/hands-on-spec-driven-development-workshop-cohort-2-tickets-1985498625838?aff=lefteris"><span>Spec-Driven Workshop at 40% off</span></a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[5 Hands-On AWS Projects That Will Prepare You for the SAA-C03]]></title><description><![CDATA[Most people study for the AWS Solutions Architect Associate by watching 40 hours of video and memorizing answers.]]></description><link>https://blog.thecloudengineers.com/p/5-hands-on-aws-projects-that-will</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/5-hands-on-aws-projects-that-will</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 29 Apr 2026 09:30:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4172ed57-b685-4720-b287-bc0ef32c47f9_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most people study for the AWS Solutions Architect Associate by watching 40 hours of video and memorizing answers. Then they get to a real job and freeze.</p><p>I took a different approach. I built projects that forced me to understand the services deeply, not just what they do, but <em>why</em> you&#8217;d choose them, what breaks under load, and how the pieces connect. I passed the SAA-C03 and could talk confidently in interviews about real implementations. Here are the 5 projects I recommend.</p><div><hr></div><h1>Sponsored by Salesforce</h1><h3>Cross-Platform Consistency with Agentforce AXL</h3><p>If you&#8217;ve ever tried to maintain brand and logic parity for an AI agent across Slack, web, and mobile, you know the pain. Enter the Agentforce Experience Layer (AXL). This new abstraction layer allows you to define agent logic and UI components once and have them render natively across any surface&#8212;including third-party platforms like Microsoft Teams.<br><br><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the AXL Orchestration Session</a></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ViTl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 424w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 848w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1272w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ViTl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png" width="1456" height="364" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:364,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:984968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/194500231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ViTl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 424w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 848w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1272w, https://substackcdn.com/image/fetch/$s_!ViTl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd8bfbce-a145-4fb9-9cc9-252cd095bda9_2048x512.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Watch the developer conference of the year. On demand on Salesforce+</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Watch TDX on demand to explore dozens of sessions covering the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Watch TDX on Salesforce+ to:</strong></p><ul><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all starts with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Watch it now and catch every moment at your own pace.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Watch Now</a></strong></h3><div><hr></div><p>Let&#8217;s now go back to our article and see the 5 projects.</p><h2>Project 1: Three-Tier Web Application</h2><p><strong>What you build:</strong> A custom VPC with public and private subnets, an Application Load Balancer, an Auto Scaling Group, and an RDS database.</p><p><strong>Why it matters:</strong> This is the foundational architecture pattern behind almost every production web workload on AWS. The exam tests it constantly, and so does every technical interview.</p><p>The critical insight here is <em>traffic flow</em>. Your ALB lives in the public subnet and accepts inbound traffic on port 443. Your EC2 instances sit in private subnets and only accept traffic from the ALB&#8217;s security group, not from the internet. Your RDS instance sits in a separate private subnet and only accepts traffic from the EC2 security group. Nothing in the data tier is ever directly reachable.</p><p>When you build this yourself, you stop memorizing &#8220;databases should be private&#8221; and start understanding <em>why</em>: defense in depth, blast radius reduction, and compliance requirements that mandate network isolation. You also learn where Auto Scaling actually helps, horizontal scaling behind the ALB, and where it doesn&#8217;t, like a single-AZ RDS instance that becomes your bottleneck at 500 concurrent connections.</p><p><strong>What the exam will ask:</strong> Multi-AZ RDS failover behavior, ALB vs. NLB selection criteria, and how security group rules differ from NACLs. Build this project and those questions answer themselves.</p><h2>Project 2: Serverless Image Processing Pipeline</h2><p><strong>What you build:</strong> An S3 upload triggers Lambda, which resizes the image, stores metadata in DynamoDB, and sends an SNS notification.</p><p><strong>Why it matters:</strong> Event-driven architecture is the dominant pattern in modern cloud systems. This project teaches you how AWS services communicate asynchronously, and what happens when they don&#8217;t.</p><p>The flow looks simple: S3 event &#8594; Lambda &#8594; DynamoDB write + SNS publish. But building it forces you to confront real decisions. What&#8217;s your Lambda timeout? If image processing takes 8 seconds and you set 5, you&#8217;ll see silent failures. What&#8217;s your DynamoDB write capacity? If you&#8217;re processing 200 images per minute and your table is provisioned for 50 WCU, you&#8217;ll hit throttling. What happens to the SNS notification if the DynamoDB write fails? You&#8217;ll need to think about idempotency and partial failure handling.</p><p><strong>What the exam will ask:</strong> S3 event notification targets, Lambda concurrency limits, DynamoDB capacity modes, and SNS delivery guarantees. This project covers all of them.</p><h2>Project 3: Disaster Recovery Solution</h2><p><strong>What you build:</strong> Cross-region S3 replication, automated RDS backups, and Route 53 failover routing.</p><p><strong>Why it matters:</strong> DR is one of the most heavily tested domains on the SAA-C03, and it&#8217;s also one of the most misunderstood in practice. Most candidates can recite the four DR strategies. Few can explain the trade-offs that determine which one you&#8217;d actually choose.</p><p>Building this project forces you to internalize the RTO/RPO trade-off with real numbers. Cross-region S3 replication gives you near-zero RPO for object storage, replication typically completes in under 15 minutes for objects under 5GB. But your RDS automated backup has an RPO of up to 24 hours unless you&#8217;re using read replicas or Aurora Global Database. Route 53 health checks with failover routing can redirect traffic in under 60 seconds, but only if your secondary environment is already running (warm standby) or fully active (multi-site active-active).</p><p>The exam distinguishes between backup/restore (hours of RTO, lowest cost), pilot light (minutes of RTO, minimal running infrastructure), warm standby (seconds to minutes, scaled-down but live), and multi-site active-active (near-zero RTO, highest cost). Build this project and you&#8217;ll understand why a financial services company chooses multi-site active-active at $40,000/month while a content platform chooses backup/restore at $200/month.</p><p><strong>What the exam will ask:</strong> S3 replication configuration, RDS backup retention, Route 53 routing policies, and how to calculate RTO/RPO for each DR tier.</p><h2>Project 4: Hybrid Storage Solution</h2><p><strong>What you build:</strong> AWS Storage Gateway connected to on-premises systems, S3 lifecycle policies moving data through storage tiers, and IAM policies controlling access.</p><p><strong>Why it matters:</strong> Most cloud engineers underestimate how much enterprise workload still runs on-premises. Storage Gateway is the bridge, and understanding it makes you sound experienced in interviews because it signals you&#8217;ve thought about migration, not just greenfield architecture.</p><p>Storage Gateway has three modes: File Gateway (NFS/SMB access to S3), Volume Gateway (iSCSI block storage backed by S3), and Tape Gateway (virtual tape library for backup software). The exam tests which mode fits which scenario. Build a File Gateway and you&#8217;ll understand why a media company with 200TB of on-premises video assets uses it to extend their NAS to S3 without rewriting their editing workflows.</p><p>S3 lifecycle policies complete the picture. Moving objects from S3 Standard to S3 Standard-IA after 30 days saves roughly 46% on storage costs. Moving to S3 Glacier Instant Retrieval after 90 days saves another 68%. For a workload storing 10TB of infrequently accessed data, that&#8217;s the difference between $230/month and $40/month. The exam will ask you to design the right lifecycle policy for a given access pattern, build this project and you&#8217;ll answer from experience, not memorization.</p><p><strong>What the exam will ask:</strong> Storage Gateway modes, S3 storage class trade-offs, lifecycle policy configuration, and IAM policy structure for cross-account S3 access.</p><h2>Project 5: Containerized Microservices Application</h2><p><strong>What you build:</strong> Two services deployed on ECS with Fargate, task definitions with resource limits, ALB path-based routing to each service, and CloudWatch Container Insights for logging and metrics.</p><p><strong>Why it matters:</strong> Containers are now a core SAA-C03 domain, and ECS with Fargate is the AWS-native answer to &#8220;I want containers without managing servers.&#8221; Building this project teaches you the ECS mental model, clusters, services, task definitions, and tasks, and how they map to the infrastructure underneath.</p><p>The ALB routing piece is where most candidates get confused. Path-based routing lets you send <code>/api/orders/*</code> to your orders service and <code>/api/inventory/*</code> to your inventory service, both running as separate ECS services behind a single ALB. Each service has its own target group, its own task definition with CPU and memory limits, and its own auto-scaling policy based on CPU utilization or request count. When you build this, you understand why a task definition with 256 CPU units and 512MB memory will throttle under load before it scales, and how to set the right CloudWatch alarm threshold to trigger scaling before users notice.</p><p>CloudWatch Container Insights gives you container-level CPU, memory, network, and disk metrics without any instrumentation. You&#8217;ll see exactly which task is consuming resources and correlate it with application logs in the same console. That operational visibility is what separates a working prototype from a production-ready deployment.</p><p><strong>What the exam will ask:</strong> ECS vs. EKS selection criteria, Fargate vs. EC2 launch type trade-offs, ALB target group configuration, and CloudWatch metrics for container workloads.</p><h2>Conclusion</h2><p>Watching videos teaches you what AWS services do. Building projects teaches you what they cost, where they fail, and why you&#8217;d choose one over another. Every question on the SAA-C03 is ultimately asking: <em>given these constraints, what&#8217;s the right architecture decision?</em></p><p>These five projects give you the intuition to answer that question, not just on the exam, but in the room when a customer asks why their database is the bottleneck, why their DR plan won&#8217;t meet their RTO, or why their serverless pipeline is costing more than expected.</p><p>Build the projects. Pass the exam. Show up to the interview ready to talk about real systems.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Building Production-Grade API Security]]></title><description><![CDATA[We built a customer-facing API processing thousands of transactions per minute.]]></description><link>https://blog.thecloudengineers.com/p/building-production-grade-api-security</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/building-production-grade-api-security</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 22 Apr 2026 09:31:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/70ba8566-be57-4213-9d70-2324df9b85d3_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>We built a customer-facing API processing thousands of transactions per minute. The challenge wasn&#8217;t just handling the volume, it was protecting against attacks while maintaining performance. Every blocked malicious request <strong>saves money</strong> and <strong>prevents potential breaches</strong>, but every millisecond of latency added by security layers impacts user experience.</p><div><hr></div><h1>Sponsored by Salesforce</h1><h3>Salesforce goes Headless 360</h3><p>The biggest news out of SF last week wasn&#8217;t a new UI, it was the decoupling of it. Salesforce is officially pivoting to an API-first &#8220;Headless 360&#8221; architecture. For those of us tired of the walled garden constraints, this is a massive shift. By separating core CRM logic and data from the standard UI, we can now treat Salesforce as a backend engine for any custom frontend or external application stack.</p><p>Click the <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">link here</a> and let me know what you think!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2383485,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:&quot;&quot;,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Explore the biggest announcements, launches, and innovations from TDX 2026.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Watch TDX on demand to explore dozens of sessions covering the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Watch TDX on Salesforce+ to:</strong></p><ul><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all starts with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Watch it now and catch every moment at your own pace.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Watch Now</a></strong></h3><div><hr></div><h2>The Requirements</h2><p>Let&#8217;s now go back to our article. We needed security that addressed four critical concerns:</p><p><strong>Application-layer protection</strong> against SQL injection, XSS, and other OWASP Top 10 attacks that target business logic rather than infrastructure.</p><p><strong>DDoS mitigation</strong> at both network and application layers without manual intervention. When attacks happen at 3 AM, automated response is non-negotiable.</p><p><strong>Rate limiting and throttling</strong> that works across a global user base. Simple IP-based limiting breaks with legitimate users behind corporate NATs or mobile carriers.</p><p><strong>Performance constraints</strong> of sub-200ms API response times even with all security layers active. Security cannot become the bottleneck.</p><p>We also needed complete observability into every attack attempt with real-time alerting when patterns emerge. The solution had to scale automatically, security couldn&#8217;t become the constraint during traffic spikes.</p><h2>The AWS Services</h2><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SDHH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SDHH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 424w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 848w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1272w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SDHH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png" width="1456" height="189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:189,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53419,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/193558909?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!SDHH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 424w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 848w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1272w, https://substackcdn.com/image/fetch/$s_!SDHH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9fa540c-0aa5-4a6e-a01c-37ccf6f78927_1652x214.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>API Gateway</strong> serves as the entry point, handling authentication, request validation, and throttling before requests reach backend functions. It&#8217;s the first line of defense that rejects malformed requests immediately.</p><p><strong>AWS WAF</strong> sits in front of API Gateway, inspecting every HTTP request against managed rule sets and custom rules. It blocks attacks at the edge before they consume API Gateway capacity or cost Lambda invocations.</p><p><strong>AWS Shield Standard</strong> comes automatically with API Gateway, providing DDoS protection against common network and transport layer attacks. Shield Advanced is available for scenarios requiring dedicated DDoS response team support.</p><h2>The Flow</h2><p>When a request hits our API, it reaches the regional API Gateway with WAF rules applied at the edge. If the request matches a block rule, suspicious patterns, known bad IPs, rate limit exceeded, it&#8217;s rejected with a 403 before reaching API Gateway.</p><p>For legitimate requests that pass WAF inspection, API Gateway applies its own throttling limits, validates the request structure, checks API keys or JWT tokens, and only then invokes backend functions. If anything fails validation, the request is rejected without consuming compute capacity.</p><p>This layered approach means attacks get stopped at the cheapest point possible. WAF blocks cost nothing beyond the WAF inspection fee. API Gateway rejections cost nothing beyond the API call. Only legitimate requests that pass all checks consume Lambda invocations.</p><h2>Deep Dive: WAF Rule Strategy</h2><p>We started with AWS Managed Rules, they catch 90% of common attacks immediately. But we hit false positives on legitimate API calls that included JSON payloads with special characters.</p><p>Our custom rules operate with three priorities:</p><p><strong>IP reputation list</strong> blocks known malicious IPs. We integrated threat intelligence feeds that update hourly. This stops repeat offenders before they even attempt an attack.</p><p><strong>Rate-based rules</strong> block IPs making more than 2,000 requests in five minutes. This catches credential stuffing attempts targeting our login endpoints. We observed attackers cycling through stolen credentials at exactly this rate, fast enough to test thousands of accounts but slow enough to avoid obvious detection.</p><p><strong>Geo-blocking rules</strong> target countries with no legitimate users but high attack traffic. This was controversial internally, but the data was clear: 95% of attacks came from regions with zero paying customers.</p><p>The critical decision: we set WAF to count mode initially, not block. We observed traffic patterns for two weeks, identified false positives, tuned rules, then switched to block mode. This prevented accidentally blocking legitimate users during the learning phase.</p><h2>Deep Dive: API Gateway Security Features</h2><p>API Gateway provides multiple security layers beyond basic routing.</p><p><strong>Request validation</strong> happens before Lambda invocation, rejecting requests with malformed JSON, missing required parameters, or invalid data types. This prevents malicious payloads from reaching backend code. We defined JSON schemas for every endpoint, verbose to maintain, but it stops entire classes of attacks.</p><p><strong>Authentication</strong> integrates with multiple mechanisms. We use API keys for simple use cases, Lambda authorizers for custom logic, and Cognito user pools for OAuth 2.0 flows. Each request is validated before consuming any compute resources.</p><p><strong>Resource policies</strong> add another control layer. We restricted our internal APIs to specific VPCs, preventing public internet access entirely while maintaining the benefits of API Gateway&#8217;s managed service.</p><h2>Deep Dive: Throttling Strategy</h2><p>API Gateway&#8217;s throttling operates at multiple levels, and understanding each is critical.</p><p><strong>Account-level limits</strong> serve as a safety net, but real control comes from usage plans. Each usage plan has both rate limits (steady-state) and burst capacity. Burst capacity matters during traffic spikes when clients temporarily exceed their rate limit using burst tokens. We set burst to 2x the rate limit, this handles legitimate spikes without triggering false positives.</p><p><strong>Custom throttling in WAF</strong> for specific endpoints adds another layer. Our login endpoints get rate-limited to 10 requests per minute per IP, far more restrictive than general API limits. This stops brute force attacks without impacting normal API usage.</p><p><strong>Method-level throttling</strong> provides even finer control. Read operations (GET) can have higher limits than write operations (POST, PUT, DELETE), reflecting their different resource consumption and risk profiles. We observed that 80% of attacks targeted write endpoints, so we throttled them more aggressively.</p><h2>Deep Dive: Shield Protection</h2><p>Shield Standard provides automatic protection against common DDoS attacks at the network and transport layers. It detects and mitigates SYN floods, UDP reflection attacks, and other volumetric attacks without any configuration.</p><p>The protection operates inline with minimal latency impact. Shield&#8217;s detection algorithms analyze traffic patterns in real-time, distinguishing legitimate traffic spikes from attack patterns. When attacks are detected, mitigation happens automatically within seconds.</p><p>For our API, Shield Advanced wasn&#8217;t initially necessary. The decision point: Shield Advanced makes sense when API downtime costs exceed $3,000/month or when regulatory requirements mandate dedicated security response. We added it after a competitor suffered a multi-day DDoS attack that made headlines.</p><h2>Monitoring and Observability</h2><p>WAF logs every blocked request to CloudWatch Logs. We stream these to S3 for long-term analysis. Analyzing these logs reveals attack patterns, identifies false positives, and guides rule tuning. We set up CloudWatch alarms for blocked request spikes, when blocks exceed 1,000 per minute, we get paged.</p><p>API Gateway metrics expose throttling events, 4xx/5xx error rates, and latency distributions. We correlate WAF blocks with API Gateway metrics to see the full security picture, which attacks reached API Gateway versus which were blocked at WAF.</p><p>X-Ray tracing adds end-to-end visibility, showing exactly where requests spend time and where failures occur. This proved essential when debugging whether performance issues stemmed from security layers, backend code, or downstream services.</p><h2>Cost Breakdown</h2><p><strong>WAF costs</strong> run approximately $5/month base fee plus $1 per million requests plus $1 per rule per month. With 10 custom rules and 100 million requests monthly, that&#8217;s $115/month.</p><p><strong>API Gateway costs</strong> are $3.50 per million requests for REST APIs. At 100 million requests monthly, that&#8217;s $350/month. Caching can reduce backend invocations significantly.</p><p><strong>Shield Standard</strong> is included at no additional cost. Shield Advanced costs $3,000/month plus data transfer fees, only justified for high-value APIs where downtime is extremely costly.</p><p><strong>Lambda invocations</strong> saved by blocking malicious requests represent the hidden cost savings. We block approximately 5 million malicious requests monthly that would have cost $1 in Lambda invocations plus potential data breach costs.</p><p>Total monthly cost for our security stack: approximately $465 for WAF and API Gateway, blocking attacks that would cost far more in compute resources and potential breaches.</p><h2>Key Takeaways</h2><p>This architecture stops millions of malicious requests monthly, requests that would cost money and potentially compromise systems. The layered approach proves essential: WAF catches obvious attacks, API Gateway enforces business logic throttling and authentication, and Shield protects against network-layer DDoS. No single layer would be sufficient.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Tracing Serverless Applications with AWS X-Ray]]></title><description><![CDATA[In the world of serverless architectures, understanding how requests flow through your distributed system can feel like navigating a maze blindfolded.]]></description><link>https://blog.thecloudengineers.com/p/tracing-serverless-applications-with</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/tracing-serverless-applications-with</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Tue, 14 Apr 2026 09:29:34 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/74727264-8899-43c5-bb01-593e9b107220_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the world of serverless architectures, understanding how requests flow through your distributed system can feel like navigating a maze blindfolded. AWS X-Ray emerges as your guiding light, providing the visibility needed to trace requests as they traverse through Lambda functions, API Gateway endpoints, SQS queues, and other AWS services.</p><div><hr></div><h1>Sponsored by Salesforce</h1><p>The shift toward Agentic AI isn&#8217;t just another buzzword&#8212;it&#8217;s redefining how we design, architect, and build modern systems.</p><p>That&#8217;s exactly why I&#8217;ll be tuning into <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">TDX on Salesforce+</a> to explore how Agentforce is shaping the next generation of the developer roadmap.</p><p>If you&#8217;re serious about staying ahead of the curve, I highly recommend grabbing a free spot and joining the sessions.</p><p>Plus, when you register, you&#8217;ll be automatically entered for a chance to win one of 20 AI exam vouchers (valued at $200)&#8212;available to legal residents of the U.S., Canada, New Zealand, and the U.K.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2383485,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the developer conference of the year. Live on Salesforce+.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Stream TDX to join dozens of sessions and virtual hands-on trainings that explore the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Tune in to TDX on Salesforce+ to:</strong></p><ul><li><p>Build hands-on skills with virtual trainings and live demos</p></li><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all kicks off with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Add it to your calendar so you don&#8217;t miss a moment.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Register for free</a></strong></h3><div><hr></div><h2>Why Tracing Matters in Serverless</h2><p>Serverless applications are inherently distributed. A single user request might trigger an API Gateway endpoint, invoke multiple Lambda functions, write to DynamoDB, publish messages to SNS, and queue tasks in SQS. When something goes wrong, or even when you&#8217;re optimizing performance, pinpointing the bottleneck becomes challenging without proper tracing.</p><p>X-Ray solves this by creating a complete map of your request journey, showing you exactly where time is spent, where errors occur, and how services interact with each other. It&#8217;s the difference between guessing and knowing.</p><h2>Understanding the Service Map</h2><p>The service map is X-Ray&#8217;s visual representation of your application architecture. It automatically discovers and displays all the services your application uses, showing the relationships between them. Each node represents a service, and the connections show how requests flow between them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!scXp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!scXp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 424w, https://substackcdn.com/image/fetch/$s_!scXp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 848w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!scXp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png" width="1456" height="731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:731,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:287398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191561839?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!scXp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 424w, https://substackcdn.com/image/fetch/$s_!scXp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 848w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1272w, https://substackcdn.com/image/fetch/$s_!scXp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F07425509-6d62-43cb-841e-fa9ba083f774_2716x1364.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What makes this powerful is the real-time health indicators. You can immediately spot services with high error rates or elevated latency. The color coding provides instant visual feedback, green for healthy, yellow for warnings, and red for errors. This bird&#8217;s-eye view helps you understand your system&#8217;s behavior at a glance.</p><h2>Traces and Segments: The Building Blocks</h2><p>Every request that flows through your application creates a trace. Within each trace, individual services create segments that represent the work they perform. For Lambda functions, a segment captures the entire function execution. For downstream calls to DynamoDB or other services, subsegments provide granular detail.</p><p>The trace timeline shows you exactly how long each segment took, helping you identify slow operations. You can drill down into specific traces to see the complete request path, including all the metadata, annotations, and errors associated with each segment.</p><h2>Practical Tips for Effective Tracing</h2><p><strong>Start with sampling strategies wisely.</strong> X-Ray uses sampling to balance cost and visibility. The default sampling rule captures the first request each second and 5% of additional requests. For production environments, this is usually sufficient. However, for critical workflows or during troubleshooting, consider creating custom sampling rules to capture 100% of specific API paths or error conditions.</p><p><strong>Use annotations for filtering and grouping.</strong> Annotations are indexed key-value pairs that you can use to filter traces in the X-Ray console. Add annotations for user IDs, transaction types, or feature flags. This makes it easy to analyze specific user journeys or compare performance across different code paths.</p><p><strong>Leverage metadata for context.</strong> Unlike annotations, metadata isn&#8217;t indexed but provides rich contextual information when viewing individual traces. Include relevant business context, configuration values, or debugging information that helps you understand what was happening during the request.</p><p><strong>Monitor cold starts separately.</strong> Lambda cold starts can significantly impact performance. Use X-Ray annotations to tag cold start invocations, allowing you to analyze their frequency and impact separately from warm starts. This helps you make informed decisions about provisioned concurrency.</p><p><strong>Set up alarms on key metrics.</strong> X-Ray integrates with CloudWatch, allowing you to create alarms based on error rates, latency percentiles, or fault rates. Don&#8217;t wait to discover problems, let X-Ray notify you when thresholds are breached.</p><p><strong>Trace asynchronous workflows.</strong> For event-driven architectures using SNS, SQS, or EventBridge, ensure you&#8217;re propagating trace context through message attributes. This maintains the trace continuity across asynchronous boundaries, giving you end-to-end visibility even in complex event chains.</p><p><strong>Use trace groups for organization.</strong> As your application grows, create trace groups to organize and filter traces by environment, application component, or team ownership. This keeps your X-Ray console manageable and helps teams focus on their specific services.</p><h2>The Bottom Line</h2><p>AWS X-Ray transforms serverless observability from reactive debugging to proactive optimization. By providing clear visibility into request flows, performance characteristics, and error patterns, it empowers you to build more reliable and efficient serverless applications.</p><p>The key is to integrate X-Ray early in your development process, not as an afterthought when problems arise. With proper instrumentation and thoughtful use of annotations and sampling, X-Ray becomes an invaluable tool for understanding and improving your serverless architecture.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[API Gateway Canary Deployments: A Strategic Approach to Safe Releases]]></title><description><![CDATA[Canary deployments are one of the most powerful risk mitigation strategies available to cloud engineers working with AWS API Gateway and Lambda.]]></description><link>https://blog.thecloudengineers.com/p/api-gateway-canary-deployments-a</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/api-gateway-canary-deployments-a</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 08 Apr 2026 09:30:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6ce77640-8a70-4628-aef8-53a009a8caa5_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Canary deployments are one of the most powerful risk mitigation strategies available to cloud engineers working with AWS API Gateway and Lambda. They allow you to test new versions of your APIs in production with real traffic while minimizing the blast radius of potential issues. Understanding how they work and when to use them can be the difference between a smooth rollout and a production incident.</p><div><hr></div><h1>Sponsored by Salesforce</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2383485,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bzDZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!bzDZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5af9c7-a343-42b7-b65d-3d7223257119_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the developer conference of the year. Live on Salesforce+.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Stream TDX to join dozens of sessions and virtual hands-on trainings that explore the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Tune in to TDX on Salesforce+ to:</strong></p><ul><li><p>Build hands-on skills with virtual trainings and live demos</p></li><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all kicks off with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Add it to your calendar so you don&#8217;t miss a moment.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Register for free</a></strong></h3><div><hr></div><h2>What Are Canary Deployments?</h2><p>A canary deployment is a progressive rollout strategy where you route a small percentage of production traffic to a new version of your application while the majority continues using the stable version. The term comes from the historical practice of using canaries in coal mines as early warning systems, if something goes wrong with the new version, only a small subset of users are affected.</p><p>In API Gateway, canary deployments work at the stage level. When you create a canary, you&#8217;re essentially splitting traffic between two versions of your API: the base deployment and the canary deployment. Each can point to different Lambda function versions or aliases, allowing you to test new code with real production traffic.</p><h2>How Canary Deployments Work with Lambda</h2><p>The integration between API Gateway canary deployments and Lambda is straightforward but powerful. When you deploy a canary in API Gateway, you specify a percentage of traffic to route to the canary version, typically starting with 5-10%. The remaining traffic continues flowing to your stable base deployment.</p><p>API Gateway uses a weighted random distribution to split traffic. This means if you set a 10% canary weight, approximately 10% of requests will hit your canary Lambda function version, while 90% go to the base version. The distribution happens at the request level, not the user level, so individual users might experience both versions during a canary deployment.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7zdz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7zdz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 424w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 848w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1272w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7zdz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png" width="470" height="336.7800453514739" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:632,&quot;width&quot;:882,&quot;resizeWidth&quot;:470,&quot;bytes&quot;:57477,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191560663?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7zdz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 424w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 848w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1272w, https://substackcdn.com/image/fetch/$s_!7zdz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d21c3f-30ec-456f-acec-bdab37f4c268_882x632.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The key is that both deployments exist simultaneously in the same stage. Your base deployment might point to Lambda alias &#8220;production&#8221; running version 5, while your canary points to alias &#8220;canary&#8221; running version 6. This allows you to validate new functionality with real traffic patterns before committing to a full rollout.</p><h2>Practical Tips for Success</h2><p><strong>Start Small and Increase Gradually</strong>: Begin with 5-10% traffic to your canary. Monitor metrics closely for 15-30 minutes before increasing. A common progression is 10% &#8594; 25% &#8594; 50% &#8594; 100%, with monitoring periods between each increase.</p><p><strong>Use Lambda Aliases, Not Versions Directly</strong>: Always point your API Gateway integrations to Lambda aliases rather than specific versions. This gives you flexibility to update what version an alias points to without redeploying your API. Use one alias for production traffic and another for canary testing.</p><p><strong>Monitor the Right Metrics</strong>: Focus on error rates, latency percentiles (especially p99), and business-specific metrics. Don&#8217;t just watch averages, a 1% error rate on your canary means 1 in 100 of your customers are having problems. Set up CloudWatch alarms that compare canary metrics against baseline metrics.</p><p><strong>Implement Proper Logging</strong>: Ensure your Lambda functions log which version they are. Include version information in structured logs so you can quickly filter and compare behavior between canary and base deployments during troubleshooting.</p><p><strong>Have a Rollback Plan</strong>: Know how to quickly promote your canary to 100% if it&#8217;s successful, or delete it entirely if problems arise. Practice these operations in non-production environments. The API Gateway console and CLI both support these operations, but automation through CI/CD pipelines is ideal.</p><p><strong>Consider Stage Variables</strong>: Use API Gateway stage variables to pass configuration to your Lambda functions. This allows the same Lambda code to behave differently based on whether it&#8217;s handling base or canary traffic, useful for feature flags or environment-specific settings.</p><p><strong>Test with Realistic Load</strong>: Canary deployments are most valuable when they experience real production traffic patterns. Avoid deploying canaries during low-traffic periods if possible, you want enough requests to generate statistically significant results.</p><p><strong>Don&#8217;t Rush the Process</strong>: The entire point of canary deployments is risk reduction. If you promote a canary to 100% within minutes, you&#8217;re not giving yourself time to detect issues. Plan for canary periods of at least several hours for critical APIs.</p><h2>When to Use Canaries</h2><p>Canary deployments shine when rolling out significant changes: new business logic, performance optimizations, dependency updates, or architectural changes. They&#8217;re less critical for minor bug fixes or configuration changes, though the safety they provide is always valuable.</p><p>The overhead of managing canaries is minimal compared to the protection they offer. For production APIs serving real customers, canary deployments should be your default deployment strategy, not an exception.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Multi-Region Serverless Architectures: Building Global Resilience with AWS]]></title><description><![CDATA[Building applications that serve users across the globe requires more than just deploying to a single region.]]></description><link>https://blog.thecloudengineers.com/p/multi-region-serverless-architectures</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/multi-region-serverless-architectures</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 01 Apr 2026 09:30:52 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e78fa5ef-5ca1-4dda-9bbc-88d4166d3b1e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Building applications that serve users across the globe requires more than just deploying to a single region. Multi-region architectures provide low latency, high availability, and disaster recovery capabilities that are essential for modern cloud applications. In this article, we&#8217;ll explore how to architect a truly global serverless application using AWS services.</p><div><hr></div><h1>Sponsored by Salesforce</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Eeh5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2386370,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/192000252?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!Eeh5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 424w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 848w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1272w, https://substackcdn.com/image/fetch/$s_!Eeh5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9131998b-e81c-414c-8b21-f717ed10b3e6_2400x1256.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Stream the developer conference of the year. Live on Salesforce+.</a></h3><p>Agentic AI is changing the game and Agentforce is leading the way. Stream TDX to join dozens of sessions and virtual hands-on trainings that explore the latest innovations across Agentforce, Data 360, the core platform, vibe coding, Slack, and more. All free on Salesforce+.</p><p><strong>Tune in to TDX on Salesforce+ to:</strong></p><ul><li><p>Build hands-on skills with virtual trainings and live demos</p></li><li><p>Get roadmap insights from the leaders shaping what&#8217;s next</p></li><li><p>Access broadcast-only moments and exclusive interviews</p></li></ul><p>It all kicks off with the main keynote, where you&#8217;ll experience the future of software and learn how to build it. Add it to your calendar so you don&#8217;t miss a moment.</p><h3><strong>&#128073; <a href="https://www.salesforce.com/plus/experience/tdx_2026?d=701ed00000iqbXAAAY&amp;nc=7013y0000022u2tAAA&amp;utm_source=loomify&amp;utm_medium=tp_email&amp;utm_campaign=emea_is_cross-cloud_cross-industry&amp;utm_content=all-segments_pg-mtp_701ed00000iqbXAAAY_english_tdx-2026">Register for free</a></strong></h3><div><hr></div><h2>Architecture Overview</h2><p>Our multi-region architecture consists of four key components working together:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OPox!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OPox!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 424w, https://substackcdn.com/image/fetch/$s_!OPox!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 848w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1272w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OPox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png" width="1456" height="697" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:697,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:166704,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/191559295?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OPox!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 424w, https://substackcdn.com/image/fetch/$s_!OPox!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 848w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1272w, https://substackcdn.com/image/fetch/$s_!OPox!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac90e99c-dd7c-4e89-9cc2-70dd99b93b99_2000x958.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>Route 53</strong>: Provides intelligent DNS routing to direct users to the nearest regional endpoint</p></li><li><p><strong>API Gateway</strong>: Regional REST or HTTP APIs that serve as entry points in each region</p></li><li><p><strong>Lambda</strong>: Compute layer that processes requests with minimal latency</p></li><li><p><strong>DynamoDB Global Tables</strong>: Multi-region, fully replicated database with automatic conflict resolution</p></li></ul><p>This architecture provides active-active deployment across multiple AWS regions, ensuring that if one region experiences issues, traffic automatically fails over to healthy regions.</p><h2>The Architecture Flow</h2><p>At the heart of a multi-region serverless architecture lies a carefully orchestrated chain of AWS services working together to deliver a seamless global experience.</p><p><strong>Route 53: The Global Traffic Director</strong></p><p>Amazon Route 53 serves as the entry point for all user requests. Using health checks and routing policies like latency-based or geoproximity routing, Route 53 intelligently directs users to the nearest healthy regional endpoint. This ensures that a user in Tokyo connects to the Asia Pacific region while a user in Frankfurt connects to Europe, minimizing latency and improving user experience.</p><p><strong>API Gateway: Regional Entry Points</strong></p><p>Each region hosts its own API Gateway endpoint, acting as the front door for that region&#8217;s serverless infrastructure. API Gateway handles request validation, throttling, and authentication before passing requests to the compute layer. With features like request/response transformation and built-in AWS WAF integration, it provides a robust security perimeter for your application.</p><p><strong>Lambda: Distributed Compute</strong></p><p>Behind each regional API Gateway sits AWS Lambda functions that execute your business logic. These functions are deployed identically across all regions, ensuring consistent behavior regardless of where the request originates. Lambda&#8217;s automatic scaling handles traffic spikes in each region independently, while its pay-per-use model keeps costs optimized even with a multi-region footprint.</p><p><strong>DynamoDB Global Tables: The Data Layer</strong></p><p>The foundation of this architecture is DynamoDB Global Tables, which provides fully managed, multi-region, multi-active database replication. When a Lambda function writes data in one region, DynamoDB automatically replicates that data to all other configured regions, typically within seconds. This means users can read and write data from their nearest region while maintaining global consistency.</p><h2>Key Architectural Considerations</h2><p><strong>Conflict Resolution</strong>: DynamoDB Global Tables uses a last-writer-wins reconciliation strategy. Your application design should account for this, potentially using timestamps or version numbers to handle concurrent updates across regions.</p><p><strong>Replication Lag</strong>: While DynamoDB replication is fast, there&#8217;s still a brief window where data might not be consistent across all regions. Design your application to handle eventual consistency gracefully.</p><p><strong>Regional Failover</strong>: Route 53 health checks continuously monitor your regional endpoints. If a region becomes unhealthy, traffic automatically routes to the next best region, providing transparent failover without user intervention.</p><p><strong>Cost Implications</strong>: Running infrastructure in multiple regions increases costs through data transfer charges and duplicated resources. Balance the number of regions against your actual user distribution and availability requirements.</p><h2>The Benefits</h2><p>This architecture delivers several compelling advantages. Users experience lower latency by connecting to nearby infrastructure. Your application remains available even if an entire AWS region experiences an outage. Data is protected through geographic redundancy. And perhaps most importantly, you can scale globally without redesigning your architecture.</p><h2>When to Use This Pattern</h2><p>Multi-region architectures make sense for applications with a global user base, strict availability requirements, or regulatory needs for data residency. However, they add complexity and cost, so evaluate whether your application truly needs global distribution or if a single-region deployment with good disaster recovery would suffice.</p><h2>Conclusion</h2><p>Building a multi-region serverless architecture with Route 53, API Gateway, Lambda, and DynamoDB Global Tables provides a robust foundation for globally distributed applications. This architecture delivers low-latency responses to users worldwide while maintaining high availability and disaster recovery capabilities.</p><p>The key to success is embracing eventual consistency, implementing comprehensive monitoring, and regularly testing your failover mechanisms. While the initial setup requires careful planning, the operational benefits of automatic scaling, reduced latency, and improved reliability make it worthwhile for applications serving a global user base.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[A Beginner’s Guide to Testing Serverless Applications]]></title><description><![CDATA[Testing Serverless Applications: A Comprehensive Guide]]></description><link>https://blog.thecloudengineers.com/p/testing-serverless-applications-a</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/testing-serverless-applications-a</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 25 Mar 2026 10:30:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9d769570-0526-4121-bcc3-0712f708d23d_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Testing serverless applications presents unique challenges that differ significantly from traditional application testing. The ephemeral nature of Lambda functions, the distributed architecture of serverless systems, and the tight integration with managed AWS services require a thoughtful approach to testing strategy. This article explores comprehensive testing methodologies for serverless applications with practical tools and frameworks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Serverless Testing Challenge</h2><p>Serverless architectures introduce complexity that makes testing more nuanced than traditional applications. Lambda functions don&#8217;t run in isolation, they interact with API Gateway, DynamoDB, S3, SQS, SNS, EventBridge, and numerous other AWS services. The pay-per-invocation model means you can&#8217;t simply spin up a test environment and leave it running. Additionally, the distributed nature of serverless systems means that failures can cascade across multiple services in ways that are difficult to predict and reproduce.</p><p>The testing pyramid for serverless applications looks different from traditional applications. While you still want a solid foundation of unit tests, the integration layer becomes significantly more important because so much of your application&#8217;s behavior depends on how services interact with each other.</p><h2>Unit Testing Lambda Functions</h2><p>Unit testing Lambda functions focuses on testing your business logic in isolation from AWS services. The key principle is to separate your core logic from the Lambda handler and AWS SDK calls. This separation allows you to test your business logic without needing to mock AWS services or make actual API calls.</p><p><strong>Testing Frameworks</strong>: Use <strong>Jest</strong> or <strong>Mocha</strong> for Node.js Lambda functions, <strong>pytest</strong> for Python, <strong>JUnit</strong> for Java, or <strong>Go&#8217;s built-in testing package</strong> for Go-based functions. These frameworks provide the foundation for writing and running your unit tests with features like test discovery, assertions, and test reporting.</p><p>Your Lambda handler should be thin, primarily responsible for parsing events, calling your business logic, and formatting responses. The actual business logic should live in separate modules that can be tested independently. This architectural pattern makes your code more testable and maintainable.</p><p>When unit testing, focus on testing edge cases, error conditions, and business rule validation. Test how your code handles malformed input, missing required fields, and unexpected data types. These tests should run quickly and require no external dependencies, making them ideal for rapid feedback during development.</p><p><strong>Mocking Tools</strong>: Mock external dependencies at the boundaries of your application using tools like <strong>aws-sdk-mock</strong> for JavaScript, or <strong>moto</strong> for Python. Rather than mocking the entire AWS SDK, create abstraction layers or interfaces that represent the operations your code needs to perform. This approach makes your tests more resilient to changes in the AWS SDK and keeps your business logic decoupled from infrastructure concerns.</p><h2>Integration Testing Strategies</h2><p>Integration testing for serverless applications verifies that your Lambda functions work correctly with actual AWS services. This is where you test the contracts between your code and services like DynamoDB, S3, SQS, and API Gateway. Integration tests are more expensive and slower than unit tests, but they catch issues that unit tests cannot.</p><p><strong>Integration Testing Tools</strong>: Leverage <strong>AWS SAM CLI</strong> for local testing and deployment, <strong>Serverless Framework&#8217;s invoke local</strong> command, or <strong>LocalStack</strong> for comprehensive AWS service emulation. For end-to-end testing, consider <strong>Postman</strong> or <strong>Newman</strong> for API testing, and <strong>aws-testing-library</strong> for programmatic integration tests.</p><p>There are several approaches to integration testing in serverless environments. One strategy involves deploying to a dedicated testing environment in AWS and running tests against real services using <strong>AWS CDK</strong> or <strong>Terraform</strong> for infrastructure provisioning. This approach provides the highest confidence that your application will work in production, but it&#8217;s also the slowest and most expensive option.</p><p>Another approach uses local emulation tools like <strong>LocalStack</strong>, <strong>SAM Local</strong>, or <strong>Serverless Offline</strong> that simulate AWS services on your development machine. These tools provide a faster feedback loop than deploying to AWS, but they may not perfectly replicate the behavior of actual AWS services. They&#8217;re best used for rapid iteration during development, with periodic validation against real AWS services.</p><p><strong>Contract Testing</strong>: Consider implementing contract testing using <strong>Pact</strong> or <strong>Spring Cloud Contract</strong> for your Lambda functions. Contract tests verify that your functions correctly handle the event formats they receive from triggers like API Gateway, EventBridge, or SQS. They also verify that your functions produce output in the format expected by downstream services. This approach helps catch breaking changes early.</p><p>Integration tests should cover the critical paths through your application, the workflows that represent your core business value. Test how your functions handle retries, how they behave under throttling conditions, and how they recover from transient failures using tools like <strong>Chaos Toolkit</strong> or <strong>AWS Fault Injection Simulator</strong> for chaos engineering experiments. These scenarios are difficult to test with unit tests alone.</p><h2>Local Development Strategies</h2><p>Effective local development for serverless applications requires tools and workflows that provide rapid feedback without requiring constant deployment to AWS. The goal is to enable developers to iterate quickly while maintaining confidence that their code will work when deployed.</p><p><strong>Local Development Tools</strong>: Use <strong>AWS SAM CLI</strong> with <code>sam local start-api</code> and <code>sam local invoke</code> commands, <strong>Serverless Framework</strong> with the <code>serverless-offline</code> plugin, or <strong>LocalStack</strong> for comprehensive local AWS service emulation. <strong>Docker</strong> is essential for containerizing your development environment to match Lambda&#8217;s execution environment.</p><p>Local development environments should replicate the Lambda execution environment as closely as possible using <strong>Docker images based on AWS Lambda base images</strong>. This includes matching the runtime version, environment variables, memory configuration, and timeout settings. Discrepancies between local and deployed environments are a common source of bugs that only appear in production.</p><p>Consider using <strong>Docker Compose</strong> to orchestrate containerized development environments that mirror the Lambda runtime. This approach ensures consistency across your development team and reduces &#8220;works on my machine&#8221; issues. Containers can include all necessary dependencies, tools, and configurations, making onboarding new developers faster and more reliable.</p><p><strong>Debugging Tools</strong>: Local development should support debugging with breakpoints and step-through execution using <strong>VS Code&#8217;s built-in debugger</strong>, <strong>AWS Toolkit for VS Code</strong>, <strong>IntelliJ IDEA with AWS Toolkit</strong>, or <strong>PyCharm&#8217;s remote debugging</strong>. The ability to pause execution, inspect variables, and step through code line by line is invaluable for understanding complex issues. This capability is especially important when working with asynchronous operations and event-driven workflows.</p><h2>Conclusion</h2><p>Testing serverless applications requires a comprehensive strategy that spans unit testing, integration testing, local development, and production monitoring. The distributed nature of serverless architectures and the tight integration with managed services make testing more complex than traditional applications, but the right tools and approach can provide confidence in your application&#8217;s reliability and correctness.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[AWS Lambda Durable Functions: What They Are and When to Use Them]]></title><description><![CDATA[AWS Lambda Durable Functions represent a significant evolution in serverless application development, introduced at AWS re:Invent 2025.]]></description><link>https://blog.thecloudengineers.com/p/aws-lambda-durable-functions-what</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/aws-lambda-durable-functions-what</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 18 Mar 2026 10:30:37 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/74bda787-af1c-4ac2-ae46-475bb18c3b27_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AWS Lambda Durable Functions represent a significant evolution in serverless application development, introduced at AWS re:Invent 2025. This new capability enables developers to build reliable, multi-step applications, that can run for extended periods, without paying for idle compute time while waiting for external events or human decisions.</p><p>In this article, we'll cover what they are, when you should use them, and how they compare with AWS Step Functions to help you choose the right orchestration tool for your serverless applications.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>What Are Lambda Durable Functions?</h2><p>Lambda Durable Functions are regular Lambda functions enhanced with stateful execution capabilities. They allow you to write complex, long-running workflows directly in your application code using familiar programming languages, rather than defining workflows in a separate orchestration language.</p><p>The key innovation lies in <strong>automatic checkpointing</strong>. As your function executes, AWS Lambda automatically saves progress at strategic points. If your workflow needs to wait for an external event, such as a webhook callback, a manual approval, or a scheduled delay, the function pauses without consuming compute resources. When the awaited event occurs, execution resumes from the last checkpoint with full context preserved.</p><p>This approach embeds orchestration logic directly into your Lambda code, eliminating the need to manage state manually or architect separate state management systems. The result is a serverless-first approach that reduces operational overhead while maintaining the scalability and pay-per-use economics that make serverless attractive.</p><h2>When to Use Durable Functions</h2><p>Lambda Durable Functions excel in specific scenarios where application logic naturally involves multiple steps with waiting periods:</p><p><strong>AI and Machine Learning Workflows</strong> - Orchestrating multi-stage AI pipelines where each step might involve model inference, data transformation, or human review. The ability to pause between stages without incurring costs makes this particularly economical for batch processing scenarios.</p><p><strong>Order Processing and E-commerce</strong> - Managing order fulfillment workflows that span inventory checks, payment processing, shipping coordination, and customer notifications. These workflows often involve waiting for external system responses or scheduled delivery windows.</p><p><strong>Approval and Human-in-the-Loop Processes</strong> - Building loan applications, expense approvals, or content moderation systems where workflows must pause for human decisions before proceeding. The year-long execution window accommodates even the most extended approval cycles.</p><p><strong>Event-Driven Application Logic</strong> - Coordinating complex business processes that respond to multiple events over time, such as customer onboarding journeys, subscription lifecycle management, or multi-step data processing pipelines.</p><p><strong>Application-Centric Orchestration</strong> - When your workflow logic is tightly coupled with your application code and benefits from being expressed in the same programming language as the rest of your business logic.</p><h2>How Durable Functions Compare with Step Functions</h2><p>Both Lambda Durable Functions and AWS Step Functions solve the problem of orchestrating multi-step workflows, but they approach it from fundamentally different philosophical and architectural standpoints.</p><p><strong>Architectural Philosophy</strong></p><p>Step Functions is built for <strong>infrastructure orchestration</strong>. It excels at coordinating disparate AWS services, providing a visual workflow designer, and offering a declarative approach using Amazon States Language (ASL). The workflow definition lives separately from your application code, making it ideal for workflows that need to be understood and modified by operations teams or non-technical stakeholders.</p><p>Durable Functions, by contrast, are optimized for <strong>application orchestration</strong>. The workflow logic lives within your Lambda function code, written in your preferred programming language. This code-first approach makes it natural for developers who want to keep orchestration logic alongside business logic, use familiar debugging tools, and leverage existing programming constructs like loops, conditionals, and error handling.</p><p><strong>Developer Experience</strong></p><p>With Step Functions, you define workflows in ASL&#8212;a JSON-based state machine definition language. While powerful and expressive, this requires learning a new syntax and switching contexts between your application code and workflow definitions. Local testing typically involves additional tooling or mocking frameworks.</p><p>Durable Functions let you write workflows as regular code. If you know how to write a Lambda function, you already know how to write a durable function. The workflow logic uses standard programming constructs, making it more intuitive for developers and easier to test locally using familiar development tools.</p><p><strong>Visibility and Monitoring</strong></p><p>Step Functions provides superior visual representation of workflows. The built-in graphical interface shows execution flow, making it easy for stakeholders to understand complex processes at a glance. This visualization is particularly valuable for compliance, auditing, and operational monitoring.</p><p>Durable Functions trade some of this visual clarity for code simplicity. While you can still monitor executions through CloudWatch and Lambda insights, the workflow structure isn&#8217;t as immediately apparent to non-technical observers.</p><p><strong>Use Case Alignment</strong></p><p>Choose <strong>Step Functions</strong> when you need to:</p><ul><li><p>Orchestrate multiple AWS services (Lambda, ECS, Glue, etc.)</p></li><li><p>Provide workflow visibility to non-technical stakeholders</p></li><li><p>Build workflows that benefit from visual design and monitoring</p></li><li><p>Implement complex branching, parallel execution, or error handling patterns that benefit from declarative definition</p></li><li><p>Maintain clear separation between orchestration and business logic</p></li></ul><p>Choose <strong>Durable Functions</strong> when you need to:</p><ul><li><p>Keep workflow logic tightly integrated with application code</p></li><li><p>Leverage existing programming language features and libraries</p></li><li><p>Simplify development by avoiding context switching between code and ASL</p></li><li><p>Build workflows where the orchestration is primarily application logic rather than service coordination</p></li><li><p>Optimize for developer velocity and code maintainability within Lambda</p></li></ul><p><strong>Cost Considerations</strong></p><p>Both services charge only for active execution time, not idle waiting periods. However, the pricing models differ slightly. Step Functions charges per state transition, while Durable Functions follow Lambda&#8217;s standard pricing model with additional charges for state persistence. For workflows with many state transitions, Durable Functions may offer cost advantages. For workflows coordinating many AWS services, Step Functions&#8217; integration capabilities may provide better value.</p><h2>Conclusion</h2><p>Lambda Durable Functions and Step Functions aren&#8217;t competitors&#8212;they&#8217;re complementary tools in your serverless toolkit. Durable Functions shine when you want to write workflow logic as code within your Lambda functions, keeping orchestration close to your application logic. Step Functions excel when you need to coordinate multiple AWS services with clear visual representation and operational oversight.</p><p>The choice ultimately depends on your team&#8217;s preferences, the nature of your workflows, and whether you value code-centric development or service-centric orchestration. Many organizations will find themselves using both, selecting the right tool for each specific use case.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Accelerating CloudFormation and CDK Development with Kiro]]></title><description><![CDATA[If you&#8217;ve spent any meaningful time building on AWS, you know the drill: you open your IDE, stare at a blank CDK stack or an empty CloudFormation template, and then spend the next 20 minutes hunting through documentation tabs, Stack Overflow threads, and GitHub samples just to get started.]]></description><link>https://blog.thecloudengineers.com/p/accelerating-cloudformation-and-cdk</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/accelerating-cloudformation-and-cdk</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 11 Mar 2026 10:30:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/61662d6c-d61a-43a2-a5a1-fd19913966b0_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;ve spent any meaningful time building on AWS, you know the drill: you open your IDE, stare at a blank CDK stack or an empty CloudFormation template, and then spend the next 20 minutes hunting through documentation tabs, Stack Overflow threads, and GitHub samples just to get started. It&#8217;s not that the tools are bad, CDK and CloudFormation are genuinely powerful. It&#8217;s that the cognitive overhead of <em>using</em> them well is enormous.</p><p>That&#8217;s exactly the problem Kiro is designed to solve.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What Is Kiro?</h2><p><a href="https://kiro.dev">Kiro</a> is an AI-powered IDE built for developers who work with cloud infrastructure. It&#8217;s not just a code editor with a chatbot bolted on, it&#8217;s a development environment that understands your infrastructure context and actively helps you build, validate, and troubleshoot it.</p><p>One of Kiro&#8217;s most compelling features is its <strong>Powers</strong> system. Powers are purpose-built AI capabilities that you can install directly into your IDE, each one tailored to a specific domain or technology. Think of them as expert co-pilots for different parts of your stack.</p><p>And for AWS engineers, there&#8217;s one Power that deserves your full attention.</p><h2>The &#8220;Build AWS Infrastructure with CDK and CloudFormation&#8221; Power</h2><p>Available at <a href="https://kiro.dev/powers/">kiro.dev/powers</a>, this Power is built specifically for AWS infrastructure development. Its description is straightforward but the implications are significant:</p><blockquote><p><em>&#8220;Build well-architected AWS infrastructure with CDK using latest documentation, best practices, and code samples. Validate CloudFormation templates, check resource configuration security compliance, and troubleshoot deployments.&#8221;</em></p></blockquote><p>This isn&#8217;t a generic AI assistant that happens to know some AWS. This is a focused, infrastructure-aware capability that integrates directly into your development workflow.</p><h2>What This Power Actually Does</h2><p>Under the hood, this Kiro Power is backed by the <strong>AWS Infrastructure as Code (IaC) MCP Server, </strong>a tool built on the Model Context Protocol (MCP) that connects AI assistants to your local AWS development environment.</p><p>The Power gives Kiro access to a set of specialized tools organized into two categories:</p><p><strong>Documentation and Knowledge Tools</strong></p><ul><li><p>Search CDK documentation and best practices</p></li><li><p>Read specific CDK documentation pages</p></li><li><p>Search CDK samples and constructs</p></li><li><p>Search CloudFormation documentation</p></li></ul><p><strong>Validation and Troubleshooting Tools</strong></p><ul><li><p>Validate CloudFormation templates before deployment</p></li><li><p>Check resource configuration for security compliance</p></li><li><p>Get pre-deployment validation instructions</p></li><li><p>Troubleshoot CloudFormation deployment failures</p></li></ul><p>Together, these capabilities cover the full lifecycle of infrastructure development, from the moment you start designing a stack to the moment you&#8217;re debugging a failed deployment.</p><h2>Why This Changes Your CDK and CloudFormation Workflow</h2><h3><strong>You Stop Guessing at Best Practices</strong></h3><p>One of the most common pain points with CDK is knowing <em>how</em> to use it correctly, not just <em>that</em> you can use it. Should you use a <code>Construct</code> or an <code>L2 Construct</code>? What&#8217;s the right pattern for a serverless API? What are the security defaults you should always set?</p><p>With this Power active, Kiro can answer these questions in context, pulling from the latest AWS documentation and surfacing the right patterns for your specific use case. You&#8217;re not getting generic advice; you&#8217;re getting guidance grounded in current AWS best practices.</p><h3><strong>You Catch Errors Before They Reach AWS</strong></h3><p>CloudFormation template errors are notoriously painful to debug. You deploy, wait several minutes, and then get a cryptic rollback message. With Kiro&#8217;s validation tools, you can catch configuration issues and security compliance gaps <em>before</em> you ever run <code>cdk deploy</code> or submit a stack.</p><p>This alone can save significant time in any infrastructure development cycle.</p><h3><strong>You Troubleshoot Deployments Faster</strong></h3><p>When deployments do fail, Kiro can help you understand why. Rather than manually cross-referencing CloudTrail logs and CloudFormation events, you can ask Kiro directly, and it will surface the relevant error context and suggest a fix. For example, if CloudTrail shows an <code>AccessDenied</code> error for a specific API call, Kiro can identify the missing permission and tell you exactly what needs to change in your deployment role.</p><h3><strong>You Learn While You Build</strong></h3><p>If you&#8217;re newer to CDK or expanding into unfamiliar AWS services, this Power doubles as a learning accelerator. You can ask Kiro to show you how to build a specific architecture pattern, and it will search CDK constructs and samples to give you multiple implementation approaches, not just one answer, but a range of options with context.</p><h2>Security and Local Execution</h2><p>A reasonable concern with any AI-assisted development tool is: <em>where does my code go?</em></p><p>The IaC MCP Server that powers this Kiro capability runs <strong>entirely on your local machine</strong>. Your CloudFormation templates and CDK code are not sent to external services. The only external calls made are for documentation searches. AWS credentials are used from your existing local configuration, the same way the AWS CLI works, and communication between the server and Kiro happens over standard input/output with no network ports opened.</p><p>For teams working in regulated environments or with sensitive infrastructure, this local execution model is a meaningful architectural choice.</p><h2>Getting Started</h2><p>Installing the Power is straightforward from within the Kiro IDE. Navigate to the <strong>Powers</strong> panel in the sidebar, find &#8220;Build AWS infrastructure with CDK and CloudFormation&#8221; under the Recommended section, and install it. Once active, Kiro gains access to all the CDK and CloudFormation tools described above.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gKuA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gKuA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 424w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 848w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gKuA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png" width="1456" height="999" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:999,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:250935,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/189967684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gKuA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 424w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 848w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1272w, https://substackcdn.com/image/fetch/$s_!gKuA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe7a5ced4-9829-4d3d-aece-93364b9a2e18_2026x1390.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Bigger Picture</h2><p>CDK and CloudFormation are not going away. If anything, infrastructure as code is becoming more central to how teams build and operate on AWS, not less. The question isn&#8217;t whether you&#8217;ll use these tools; it&#8217;s how efficiently you&#8217;ll use them.</p><p>Kiro&#8217;s CDK and CloudFormation Power represents a meaningful shift in that equation. It brings documentation, validation, compliance checking, and troubleshooting directly into your development environment, reducing the context-switching that slows down infrastructure work and helping you build well-architected systems faster.</p><p>If you&#8217;re serious about AWS infrastructure development, this is worth adding to your workflow.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[When NOT to Go Serverless: 5 Use Cases Where It’s the Wrong Choice]]></title><description><![CDATA[Serverless is one of the most transformative paradigms in cloud computing.]]></description><link>https://blog.thecloudengineers.com/p/when-not-to-go-serverless-5-use-cases</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/when-not-to-go-serverless-5-use-cases</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 04 Mar 2026 10:30:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/255b3b3c-480f-4606-b519-65b7ef22ad5e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Serverless is one of the most transformative paradigms in cloud computing. AWS Lambda, API Gateway, DynamoDB, and the broader serverless ecosystem let you build and scale applications without ever thinking about servers. It&#8217;s fast to develop, often cheap to run, and incredibly powerful.</p><p>But here&#8217;s the thing: <strong>serverless is not a silver bullet.</strong></p><p>After years of building on AWS, I&#8217;ve seen teams adopt serverless for everything &#8212; and sometimes pay the price in performance, cost, complexity, or all three. The best architects don&#8217;t just know <em>when</em> to use a tool. They know <em>when not to</em>.</p><p>In this article, I&#8217;ll walk you through <strong>five real-world use cases where going serverless is the wrong choice</strong>, and what you should consider instead.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>1. Long-Running, Compute-Heavy Workloads</h2><h3>The Problem</h3><p>AWS Lambda has a <strong>hard timeout of 15 minutes</strong>. If your workload takes longer than that, Lambda simply isn&#8217;t an option. But even workloads that <em>fit</em> within 15 minutes can be a poor match.</p><p>Think about:</p><ul><li><p><strong>Video transcoding or rendering</strong></p></li><li><p><strong>Large-scale data transformations (ETL)</strong></p></li><li><p><strong>Machine learning model training</strong></p></li><li><p><strong>Complex PDF generation from massive datasets</strong></p></li></ul><p>These tasks are CPU-intensive and often run for extended periods. Shoehorning them into Lambda means you&#8217;ll spend your time fighting the platform instead of leveraging it.</p><h3>Why Serverless Fails Here</h3><ul><li><p><strong>15-minute max execution time</strong> is a hard ceiling you can&#8217;t negotiate.</p></li><li><p><strong>CPU is allocated proportionally to memory</strong> in Lambda. To get meaningful compute power, you might need to allocate 4&#8211;10 GB of memory &#8212; even if you only need 512 MB of RAM. You&#8217;re paying for memory you don&#8217;t use just to get CPU.</p></li><li><p><strong>No GPU access.</strong> If your workload benefits from GPU acceleration, Lambda is out of the question.</p></li><li><p>Orchestrating many short Lambda functions to chunk the work adds <strong>significant complexity</strong> and potential failure points.</p></li></ul><h3>What to Use Instead</h3><ul><li><p><strong>AWS Fargate (ECS) </strong>&#8212;<strong> </strong>Containerized long-running tasks without managing servers</p></li><li><p><strong>EC2 Spot Instances </strong>&#8212;<strong> </strong>Cost-effective heavy compute (batch jobs, rendering)</p></li><li><p><strong>AWS Batch </strong>&#8212;<strong> </strong>Managed batch processing at scale</p></li><li><p><strong>SageMaker </strong>&#8212;<strong> </strong>ML model training specifically</p></li><li><p><strong>Step Functions + Fargate </strong>&#8212;<strong> </strong>Orchestrated pipelines with long-running steps</p></li></ul><h3>A Practical Example</h3><p>Let&#8217;s say you&#8217;re processing uploaded video files. Instead of Lambda:</p><p><code>S3 Upload Event &#8594; EventBridge &#8594; Fargate Task (ECS)</code></p><p>Fargate gives you configurable vCPU and memory, no time limits, and you only pay for the duration of the task. You still get the &#8220;no servers to manage&#8221; benefit without Lambda&#8217;s constraints.</p><h2>2. High-Throughput, Latency-Sensitive Applications</h2><h3>The Problem</h3><p>You&#8217;re building an application that needs to handle <strong>thousands of requests per second</strong> with <strong>consistent, single-digit millisecond response times</strong>. Think:</p><ul><li><p><strong>Real-time bidding platforms</strong></p></li><li><p><strong>High-frequency trading APIs</strong></p></li><li><p><strong>Multiplayer game backends</strong></p></li><li><p><strong>Real-time fraud detection at the edge</strong></p></li></ul><h3>Why Serverless Fails Here</h3><p>The biggest villain here is the <strong>cold start</strong>.</p><p>When Lambda spins up a new execution environment, there&#8217;s an initialization delay.</p><ul><li><p>Python, Node.js: 100&#8211;300 ms</p></li><li><p>Java, .NET: 500 ms &#8211; 2+ seconds</p></li></ul><p>For most applications, cold starts are a minor nuisance. For latency-critical systems, they&#8217;re a dealbreaker.</p><p>Yes, <strong>Provisioned Concurrency</strong> exists, and <strong>SnapStart</strong> helps with Java. But:</p><ul><li><p>Provisioned Concurrency is essentially <strong>pre-warming containers</strong> &#8212; which defeats the core serverless value proposition. You&#8217;re paying for idle compute.</p></li><li><p>You still can&#8217;t guarantee <strong>tail latency</strong> (p99/p99.9) the way you can with a warm, long-running process.</p></li><li><p>At high throughput, the <strong>API Gateway overhead</strong> (typically 10&#8211;30 ms) adds up and is hard to eliminate.</p></li></ul><h3>What to Use Instead</h3><ul><li><p><strong>ECS/EKS on Fargate or EC2</strong> &#8212; Long-running containers with persistent connections and predictable latency.</p></li><li><p><strong>EC2 with Auto Scaling</strong> &#8212; Maximum control over the runtime, networking, and performance tuning.</p></li><li><p><strong>ElastiCache (Redis)</strong> &#8212; Pair with containers for ultra-low-latency data access.</p></li></ul><h3>The Key Takeaway</h3><p>If your SLA says <strong>&#8220;p99 latency under 10 ms&#8221;</strong>, serverless will make that extremely difficult and expensive to achieve. Use always-on compute with connection pooling and warm caches.</p><h2>3. Applications with Complex or Stateful Workflows (That Need Persistent Connections)</h2><h3>The Problem</h3><p>Serverless functions are <strong>stateless and ephemeral</strong> by design. Each invocation is independent. There&#8217;s no shared memory, no local disk that persists, and no guarantee that the same instance handles consecutive requests.</p><p>This becomes painful when you need:</p><ul><li><p><strong>WebSocket connections with complex server-side state</strong></p></li><li><p><strong>Long-lived database connection pools</strong></p></li><li><p><strong>In-memory caching across requests</strong></p></li><li><p><strong>Sticky sessions</strong></p></li></ul><h3>Why Serverless Fails Here</h3><p><strong>Database connections are the classic pain point.</strong> Every Lambda invocation may open a new connection to your relational database. Under load, you can easily exhaust your database&#8217;s connection limit.</p><pre><code><code>100 concurrent Lambda invocations = 100 database connections
1,000 concurrent invocations = &#128165; database connection limit exceeded</code></code></pre><p><strong>RDS Proxy</strong> helps by pooling connections, but it:</p><ul><li><p>Adds cost</p></li><li><p>Adds latency (~5&#8211;15 ms per query overhead)</p></li><li><p>Is another component to configure and monitor</p></li><li><p>Doesn&#8217;t eliminate the fundamental mismatch &#8212; it just papers over it</p></li></ul><p><strong>In-memory state is another problem.</strong> If your application logic depends on keeping data in memory across requests (e.g., a game server tracking player positions, a collaborative editing backend), Lambda can&#8217;t help you. You&#8217;d have to externalize all state to DynamoDB, ElastiCache, or S3 &#8212; adding latency and complexity for every operation.</p><h3>What to Use Instead</h3><ul><li><p><strong>ECS/Fargate</strong> with long-running containers that maintain connection pools and in-memory state.</p></li><li><p><strong>EC2</strong> for maximum control over memory, connections, and runtime behavior.</p></li><li><p><strong>ElastiCache</strong> if you need shared in-memory state across instances.</p></li><li><p><strong>AppSync</strong> for managed GraphQL with real-time subscriptions (a serverless option that <em>does</em> handle WebSockets well, but for a specific use case).</p></li></ul><h3>When It&#8217;s a Judgment Call</h3><p>If you&#8217;re building a standard REST API with a relational database, serverless <em>can</em> work &#8212; especially with RDS Proxy and moderate traffic. But if your application is <strong>connection-heavy or state-heavy</strong>, you&#8217;ll spend more time fighting the architecture than building features.</p><h2>4. Predictable, Always-On, Steady-State Workloads</h2><h3>The Problem</h3><p>Serverless pricing is brilliant for <strong>spiky, unpredictable, or low-traffic workloads</strong>. You pay per invocation, per millisecond of compute. When traffic is zero, your bill is zero.</p><p>But what about workloads that run <strong>24/7 at a consistent, high volume</strong>?</p><ul><li><p>A backend API serving <strong>millions of requests per day, evenly distributed</strong></p></li><li><p>A stream processor consuming from Kinesis <strong>around the clock</strong></p></li><li><p>A WebSocket server with <strong>thousands of persistent connections</strong></p></li></ul><h3>Why Serverless Fails Here</h3><p>Let&#8217;s do some back-of-the-envelope math.</p><p><strong>Scenario:</strong> An API handling 50 million requests/month, average execution time of 200 ms, with 512 MB memory allocated.</p><p><strong>Lambda Cost:</strong></p><pre><code><code>Compute: 50M &#215; 0.2s &#215; 0.5 GB = 5,000,000 GB-seconds
Cost:    5,000,000 &#215; $0.0000166667 = ~$83.33
Requests: 50M &#215; $0.20/million     = ~$10.00
API Gateway: 50M &#215; $1.00/million  = ~$50.00
                                    --------
Total:                              ~$143/month</code></code></pre><p><strong>Fargate Equivalent (2 tasks, 0.5 vCPU, 1 GB each):</strong></p><pre><code><code>vCPU:   2 &#215; 0.5 &#215; $0.04048/hr &#215; 730 hrs = ~$29.55
Memory: 2 &#215; 1 GB &#215; $0.004445/hr &#215; 730 hrs = ~$6.49
ALB:    ~$16.20 + LCU charges (~$5-10)     = ~$25.00
                                              --------
Total:                                       ~$61/month</code></code></pre><p><strong>That&#8217;s a ~57% cost reduction with Fargate</strong> &#8212; and the gap widens as traffic increases.</p><h3>The Rule of Thumb</h3><p>Ask yourself:</p><blockquote><p><em><strong>&#8220;Will my compute utilization be consistently above 20-30%?&#8221;</strong></em></p></blockquote><p>If yes, reserved/always-on compute (Fargate, EC2 Reserved Instances, or Savings Plans) will almost always be cheaper.</p><h3>What to Use Instead</h3><ul><li><p><strong>Fargate with Auto Scaling </strong>&#8212; Steady containerized workloads with occasional spikes</p></li><li><p><strong>EC2 Reserved Instances / Savings Plans </strong>&#8212; Predictable, long-term workloads at the best price</p></li><li><p><strong>EKS </strong>&#8212; When you need Kubernetes orchestration at scale</p></li></ul><h3>The Hybrid Approach</h3><p>The best architectures often <strong>mix serverless and containers</strong>:</p><ul><li><p>Use <strong>Lambda</strong> for event-driven, asynchronous tasks (S3 triggers, SNS handlers, cron jobs).</p></li><li><p>Use <strong>Fargate/EC2</strong> for the high-throughput, steady-state API layer.</p></li></ul><p>You get the best of both worlds.</p><h2>5. Complex Local Development and Debugging Requirements</h2><h3>The Problem</h3><p>This one is less about production and more about <strong>developer experience</strong> &#8212; which matters more than many teams admit.</p><p>Serverless applications are inherently <strong>distributed</strong>. A single user action might involve:</p><p><code>API Gateway &#8594; Lambda &#8594; DynamoDB &#8594; DynamoDB Stream &#8594; Lambda &#8594; SNS &#8594; Lambda &#8594; SQS &#8594; Lambda</code></p><p>Try debugging that locally.</p><h3>Why Serverless Fails Here</h3><ul><li><p><strong>Local emulation is imperfect.</strong> Tools like SAM CLI and LocalStack are helpful but don&#8217;t perfectly replicate IAM permissions, event source mappings, service integrations, or edge cases.</p></li><li><p><strong>&#8220;Console log&#8221; debugging</strong> is still the primary debugging strategy for many Lambda developers. Attaching a real debugger with breakpoints is possible but cumbersome.</p></li><li><p><strong>Integration testing is hard.</strong> You often need a deployed AWS environment to truly test your application, which slows down the feedback loop.</p></li><li><p><strong>Distributed tracing adds overhead.</strong> You&#8217;ll need X-Ray or third-party tools to understand request flows across services.</p></li><li><p><strong>Each developer needs their own cloud sandbox.</strong> Unlike a containerized app where you can <code>docker-compose up</code> and have a full environment locally, serverless often requires deploying to AWS &#8212; leading to shared-environment conflicts or the cost of per-developer AWS accounts.</p></li></ul><h3>When This Matters Most</h3><ul><li><p><strong>Large teams</strong> where developer productivity at scale is critical</p></li><li><p><strong>Complex, multi-service architectures</strong> with many interconnected functions</p></li><li><p><strong>Regulated industries</strong> where you need to reproduce and debug production issues quickly</p></li><li><p><strong>Teams new to serverless</strong> with existing expertise in containers or traditional architectures</p></li></ul><h3>What to Use Instead</h3><p>If developer experience is a priority and your team is more productive with containers:</p><ul><li><p><strong>Docker Compose + ECS/Fargate</strong> &#8212; Develop locally with Docker, deploy to Fargate. Nearly identical environments.</p></li><li><p><strong>EKS with Skaffold or Telepresence</strong> &#8212; Hot-reload Kubernetes development.</p></li></ul><h3>A Balanced View</h3><p>To be fair, the serverless developer experience is <strong>improving rapidly</strong>. Tools like <strong>SST (Serverless Stack)</strong>, <strong>AWS SAM Accelerate</strong>, and <strong>Lambda Powertools</strong> have made development significantly better. If you&#8217;re starting a greenfield project with a small team, serverless DX is often <em>good enough</em>.</p><p>But for large, complex systems &#8212; especially if your team has deep container expertise &#8212; forcing a serverless-first approach can meaningfully slow development velocity.</p><h2>So... When SHOULD You Go Serverless?</h2><p>After all that, let&#8217;s end on a positive note. Serverless is the <strong>right choice</strong> when:</p><ol><li><p>Your traffic is <strong>spiky, unpredictable, or low-volume</strong></p></li><li><p>You want to <strong>minimize operational overhead</strong> (no patching, no capacity planning)</p></li><li><p>You&#8217;re building <strong>event-driven architectures</strong> (S3 triggers, message processing, webhooks)</p></li><li><p>You need to <strong>ship fast</strong> with a small team</p></li><li><p>Your workloads are <strong>short-lived</strong> (seconds, not minutes)</p></li><li><p>You want <strong>automatic scaling</strong> from zero to massive without configuration</p></li><li><p>You&#8217;re building <strong>microservices, APIs, or background jobs</strong> with moderate latency requirements</p></li></ol><h2>The Bottom Line</h2><p>The best cloud architects aren&#8217;t loyal to any single paradigm. They pick the right tool for the job.</p><p>Serverless is incredible &#8212; I use it extensively and recommend it often. But blindly applying it to every problem leads to over-engineered, expensive, and frustrating architectures.</p><p><strong>Before going serverless, ask yourself:</strong></p><ol><li><p>Does my workload fit within Lambda&#8217;s execution limits?</p></li><li><p>Can I tolerate cold start latency?</p></li><li><p>Is my application fundamentally stateless?</p></li><li><p>Is my traffic pattern spiky or variable?</p></li><li><p>Can my team debug and develop effectively in a serverless model?</p></li></ol><p>If the answer to any of these is <strong>&#8220;no&#8221;</strong>, consider containers, EC2, or a hybrid approach. There&#8217;s no shame in using a server. After all, serverless still runs on them.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Securing Your AWS Account in the First 24 Hours (A Checklist)]]></title><description><![CDATA[You just created a new AWS account.]]></description><link>https://blog.thecloudengineers.com/p/securing-your-aws-account-in-the</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/securing-your-aws-account-in-the</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 25 Feb 2026 10:31:32 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/28b5c701-ea1a-49ec-a505-6f3b387f2420_2912x1632.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You just created a new AWS account. Maybe it&#8217;s for a personal project, a startup, or a new workload at your company. Whatever the reason, the clock is ticking. An unsecured AWS account is a magnet for attackers, and the damage can happen fast, from cryptomining charges to full data breaches.</p><p>The good news? You can dramatically reduce your risk by following a clear set of steps in the first 24 hours. This article is that checklist.</p><p>Let&#8217;s get started.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Why the First 24 Hours Matter</h2><p>New AWS accounts are often in their most vulnerable state right after creation. Default settings are permissive, there&#8217;s no monitoring in place, and credentials are frequently mishandled. Automated bots constantly scan for exposed keys and misconfigured resources. Stories of developers waking up to five-figure AWS bills after accidentally committing credentials to GitHub are not myths, they happen regularly.</p><p>Treating security as a &#8220;day one&#8221; priority is not paranoia. It&#8217;s engineering discipline.</p><h2>Phase 1: Protect the Root Account</h2><p>The root account is the single most powerful identity in your AWS environment. It has unrestricted access to everything, and it cannot be limited by IAM policies. Your first actions should be to lock it down.</p><h3>1. Set a Strong, Unique Password for the Root User</h3><p>The password you chose during sign-up might have been rushed. Go to the account settings and make sure the root password is long, random, and stored in a reputable password manager. Do not reuse a password from another service.</p><h3>2. Enable Multi-Factor Authentication (MFA) on the Root Account</h3><p>This is the single most important step on this entire list. Go to the IAM console, select the root user, and enable MFA. A hardware security key (such as a YubiKey) is the strongest option. A virtual MFA app like Google Authenticator or Authy is a solid alternative. Avoid SMS-based MFA if possible, as it is vulnerable to SIM-swapping attacks.</p><h3>3. Do Not Generate Access Keys for the Root Account</h3><p>The root user should never have programmatic access keys. If keys already exist for the root account, delete them immediately. There is no legitimate operational reason to use root access keys.</p><h3>4. Store Root Credentials Securely and Rarely Use Them</h3><p>After setting up the root account, you should almost never log into it again. Store the credentials and MFA recovery codes in a secure vault (a physical safe or an encrypted password manager with restricted access). The root account should only be used for the handful of tasks that specifically require it, such as changing the account&#8217;s support plan or closing the account.</p><h2>Phase 2: Set Up Proper IAM Identity Management</h2><p>Now that root is locked away, you need a secure way for humans and services to interact with the account.</p><h3>5. Create an IAM Admin User or Use AWS IAM Identity Center</h3><p>Create a dedicated IAM user (or, even better, use IAM Identity Center if you&#8217;re working within AWS Organizations) with administrative permissions. This is the identity you&#8217;ll use for day-to-day management &#8212; not root.</p><p>If you&#8217;re managing a single account, an IAM user in the &#8220;Administrators&#8221; group is sufficient. If you&#8217;re managing multiple accounts, invest the time to set up IAM Identity Center (formerly AWS SSO) from the start. It will save you enormous headaches later.</p><h3>6. Enable MFA for All Human IAM Users</h3><p>Every person who can log into the AWS console should have MFA enabled. No exceptions. Consider creating an IAM policy that denies all actions until MFA is activated, which encourages users to set it up immediately upon first login.</p><h3>7. Apply the Principle of Least Privilege</h3><p>Resist the temptation to give every user or role full admin access. Start with the minimum permissions required for each person or service, and expand only when there is a justified need. Use AWS managed policies as a starting point, but review what they allow. The &#8220;PowerUserAccess&#8221; policy, for example, is far more permissive than most people realize.</p><h3>8. Create an IAM Password Policy</h3><p>Configure a strong password policy for IAM users in your account. Enforce minimum length (at least 14 characters), require a mix of character types, and enable password expiration if your organization requires it. This is found under IAM &gt; Account settings.</p><h2>Phase 3: Enable Logging and Monitoring</h2><p>You can&#8217;t protect what you can&#8217;t see. Visibility into account activity is non-negotiable.</p><h3>9. Enable AWS CloudTrail in All Regions</h3><p>CloudTrail records every API call made in your account. It&#8217;s your audit trail, your forensic evidence, and your early warning system. Create a trail that applies to all regions and logs to a dedicated S3 bucket. Enable log file validation so you can verify that logs haven&#8217;t been tampered with.</p><p>This is one of those services that costs very little but provides enormous value. Do not skip it.</p><h3>10. Enable CloudTrail Log File Integrity Validation</h3><p>When you set up CloudTrail, enable digest file delivery. This allows you to later verify that no log files were modified or deleted after delivery. In the event of a security incident, you need to trust your logs.</p><h3>11. Turn On AWS Config</h3><p>AWS Config continuously monitors and records your resource configurations and evaluates them against rules. Enable it in all regions. It answers questions like: &#8220;When did this security group change?&#8221; and &#8220;Is this S3 bucket publicly accessible?&#8221;</p><p>You can start with the AWS Config conformance packs, which provide pre-built rule sets for common compliance frameworks.</p><h3>12. Enable Amazon GuardDuty</h3><p>GuardDuty is a threat detection service that analyzes CloudTrail logs, VPC flow logs, and DNS logs to identify malicious activity. It can detect things like cryptocurrency mining, compromised credentials, and unusual API calls. Enabling it takes about 30 seconds, and it starts working immediately.</p><p>There is a 30-day free trial, and the ongoing cost is typically very reasonable relative to the protection it provides.</p><h3>13. Set Up Billing Alerts and Anomaly Detection</h3><p>Security incidents often show up as billing anomalies first. Go to the Billing and Cost Management console and enable the following:</p><ul><li><p><strong>AWS Budgets</strong>: Create a monthly budget with a threshold alert. Even a simple &#8220;notify me if my spend exceeds $50&#8221; can save you from a surprise bill.</p></li><li><p><strong>Cost Anomaly Detection</strong>: This service uses machine learning to detect unusual spending patterns and alert you automatically.</p></li></ul><p>An unexpected spike in EC2 or Lambda charges could be the first sign that someone is using your account for unauthorized purposes.</p><h2>Phase 4: Secure Your Network Defaults</h2><p>Every new AWS account comes with default networking resources that are more permissive than you might want.</p><h3>14. Review and Restrict Default Security Groups</h3><p>Every VPC comes with a default security group that allows all outbound traffic and allows all inbound traffic from resources assigned to the same security group. While this isn&#8217;t publicly open, it&#8217;s more permissive than most workloads need. Adopt the practice of creating custom, purpose-specific security groups and never attaching the default security group to resources.</p><h3>15. Delete or Ignore the Default VPC (Use Custom VPCs Instead)</h3><p>The default VPC in each region comes with public subnets and an internet gateway already attached. It&#8217;s designed for convenience, not security. For any real workload, create a custom VPC with a deliberate network design &#8212; private subnets for backend services, public subnets only where necessary, and proper route table configuration.</p><p>You don&#8217;t necessarily need to delete the default VPC (some AWS services expect it to exist), but make a rule to never deploy production workloads into it.</p><h3>16. Block Public Access to S3 at the Account Level</h3><p>One of the most common AWS security headlines involves publicly exposed S3 buckets. AWS now provides an account-level setting called &#8220;S3 Block Public Access.&#8221; Enable all four options at the account level. If a specific bucket later needs public access (for static website hosting, for example), you can override it deliberately for that one bucket. The default should be locked down.</p><h2>Phase 5: Establish Account-Level Protections</h2><p>These steps create safety nets that protect you across the entire account.</p><h3>17. Enable EBS Default Encryption</h3><p>Navigate to the EC2 console and enable default EBS encryption for each region you plan to use. This ensures that every new EBS volume is encrypted automatically, without requiring the person launching the instance to remember to check a box. You can use the AWS-managed key or specify a customer-managed KMS key.</p><h3>18. Configure Account-Level Contacts</h3><p>Under Account Settings, make sure the security, billing, and operations contact fields are filled in with valid email addresses. AWS uses these contacts to reach you in the event of abuse notifications or security issues. If these emails go to an unmonitored inbox, you might miss a critical warning.</p><h3>19. Enable AWS Security Hub (Optional but Recommended)</h3><p>Security Hub aggregates findings from GuardDuty, Config, IAM Access Analyzer, and other services into a single dashboard. It also runs automated security checks against industry benchmarks like CIS AWS Foundations. If you&#8217;ve already enabled the services listed above, Security Hub ties everything together and gives you a centralized security posture score.</p><h3>20. Enable IAM Access Analyzer</h3><p>IAM Access Analyzer helps you identify resources in your account that are shared with an external entity &#8212; for example, an S3 bucket policy that grants access to another AWS account, or an IAM role that can be assumed externally. Enable it in each region. It runs continuously and alerts you when it finds unintended external access.</p><h2>Phase 6: Plan for the Worst</h2><p>Security isn&#8217;t just about prevention. You also need to prepare for the possibility that something goes wrong.</p><h3>21. Document Your Incident Response Plan</h3><p>Even a simple one-page document is better than nothing. Answer these questions in advance:</p><ul><li><p>Who is responsible for responding to a security alert?</p></li><li><p>How do we revoke compromised credentials?</p></li><li><p>Where are our logs stored and how do we access them?</p></li><li><p>Who do we contact at AWS for support?</p></li></ul><p>Under pressure during an incident is the worst time to figure out these answers.</p><h3>22. Know How to Contact AWS Support</h3><p>If your account is compromised, you may need to contact AWS quickly. Make sure you know how to open a support case, and consider whether your current support plan (Basic, Developer, Business, or Enterprise) provides the response time you&#8217;d need in an emergency. Business-level support or above gives you 24/7 access to Cloud Support Engineers.</p><h3>23. Create an Emergency &#8220;Break Glass&#8221; Procedure for Root Access</h3><p>Since you&#8217;ve locked down root access (as described in Phase 1), document the procedure for accessing it in an emergency. Who has access to the password vault? Where is the MFA device stored? What approvals are needed? This should be written down, tested, and reviewed periodically.</p><h2>Final Thoughts</h2><p>None of these steps are difficult. Most take less than five minutes. But together, they form a security foundation that will protect you from the vast majority of common AWS threats.</p><p>The most expensive security incident is the one you could have prevented with ten minutes of configuration on day one. Don&#8217;t learn that lesson the hard way.</p><p>Bookmark this checklist. Share it with your team. Run through it every time a new account is created. Your future self will thank you.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[SQL or NoSQL on AWS? A Practical Decision Framework for Cloud Engineers]]></title><description><![CDATA[Choosing the right database is one of the most impactful architectural decisions you&#8217;ll make on AWS.]]></description><link>https://blog.thecloudengineers.com/p/sql-or-nosql-on-aws-a-practical-decision</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/sql-or-nosql-on-aws-a-practical-decision</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 18 Feb 2026 10:30:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/426be2dc-4473-4a56-a06e-50eb87344778_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Choosing the right database is one of the most impactful architectural decisions you&#8217;ll make on AWS. Pick the wrong one, and you&#8217;ll face performance bottlenecks, skyrocketing costs, and painful migrations down the road.</p><p>In this article, we&#8217;ll break down the differences between SQL and NoSQL databases on AWS, explore the main services available, and give you a practical decision framework so you can choose with confidence.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Understanding the Fundamentals</h2><p>Before diving into AWS services, let&#8217;s make sure we&#8217;re on the same page about what sets these two paradigms apart.</p><h3>SQL (Relational Databases)</h3><p>Relational databases store data in <strong>tables</strong> with predefined <strong>schemas</strong> made up of rows and columns. They use <strong>Structured Query Language (SQL)</strong> to query and manipulate data and enforce relationships between tables through foreign keys and joins.</p><p><strong>Key characteristics:</strong></p><ul><li><p><strong>Structured schema:</strong> You define your tables, columns, and data types upfront.</p></li><li><p><strong>ACID compliance:</strong> Transactions are Atomic, Consistent, Isolated, and Durable.</p></li><li><p><strong>Strong consistency:</strong> Reads always return the most recent write.</p></li><li><p><strong>Vertical scaling:</strong> Traditionally scaled by adding more power (CPU, RAM) to a single server.</p></li><li><p><strong>Complex queries:</strong> Excellent support for JOINs, aggregations, and ad-hoc queries.</p></li></ul><h3>NoSQL (Non-Relational Databases)</h3><p>NoSQL databases break away from the rigid tabular model. They come in several flavors, document, key-value, wide-column, and graph, each optimized for different access patterns.</p><p><strong>Key characteristics:</strong></p><ul><li><p><strong>Flexible schema:</strong> Fields can vary from record to record.</p></li><li><p><strong>BASE model:</strong> Basically Available, Soft state, Eventually consistent.</p></li><li><p><strong>Horizontal scaling:</strong> Designed to scale out across many nodes.</p></li><li><p><strong>High throughput:</strong> Optimized for massive read/write workloads.</p></li><li><p><strong>Simple queries:</strong> Best suited for known access patterns rather than ad-hoc queries.</p></li></ul><h2>SQL Databases on AWS</h2><h3>Amazon RDS (Relational Database Service)</h3><p>Amazon RDS is a <strong>managed service</strong> that supports six popular relational engines:</p><ul><li><p>MySQL</p></li><li><p>PostgreSQL</p></li><li><p>MariaDB</p></li><li><p>Oracle</p></li><li><p>SQL Server</p></li><li><p>Amazon Aurora (MySQL &amp; PostgreSQL compatible)</p></li></ul><p><strong>When to use RDS:</strong></p><ul><li><p>Your data is highly relational with complex relationships.</p></li><li><p>You need ACID-compliant transactions (e.g., financial systems).</p></li><li><p>Your team is experienced with SQL.</p></li><li><p>You want a managed service but with a traditional relational model.</p></li></ul><p><strong>Example use cases:</strong> E-commerce order management, HR systems, CMS platforms, ERP systems.</p><h3>Amazon Aurora</h3><p>Aurora deserves special attention. It&#8217;s <strong>AWS&#8217;s cloud-native relational database</strong>, compatible with MySQL and PostgreSQL but re-engineered for the cloud.</p><p><strong>Why Aurora stands out:</strong></p><ul><li><p><strong>5x throughput</strong> of standard MySQL, <strong>3x</strong> of standard PostgreSQL.</p></li><li><p>Storage <strong>auto-scales</strong> up to 128 TB.</p></li><li><p><strong>6-way replication</strong> across 3 Availability Zones.</p></li><li><p>Supports up to <strong>15 read replicas</strong> with minimal lag.</p></li><li><p><strong>Aurora Serverless v2</strong> scales capacity automatically based on demand.</p></li></ul><p><strong>When to use Aurora over standard RDS:</strong></p><ul><li><p>You need higher performance and availability.</p></li><li><p>Your workload has unpredictable traffic (Aurora Serverless).</p></li><li><p>You want automatic storage scaling.</p></li></ul><h2>NoSQL Databases on AWS</h2><h3>Amazon DynamoDB</h3><p>DynamoDB is AWS&#8217;s flagship <strong>fully managed key-value and document database</strong>. It&#8217;s designed for applications that need consistent, single-digit millisecond latency at any scale.</p><p><strong>Key features:</strong></p><ul><li><p><strong>Serverless:</strong> No servers to manage, patch, or provision.</p></li><li><p><strong>Auto-scaling:</strong> Throughput adjusts to traffic automatically (on-demand mode).</p></li><li><p><strong>Global Tables:</strong> Multi-region, multi-active replication.</p></li><li><p><strong>DAX (DynamoDB Accelerator):</strong> In-memory caching for microsecond reads.</p></li><li><p><strong>Streams:</strong> Capture item-level changes for event-driven architectures.</p></li></ul><p><strong>When to use DynamoDB:</strong></p><ul><li><p>You need extreme scale and low latency.</p></li><li><p>Your access patterns are well-defined and predictable.</p></li><li><p>You&#8217;re building serverless applications (Lambda + API Gateway + DynamoDB).</p></li><li><p>You need a simple key-value or document store.</p></li></ul><p><strong>Example use cases:</strong> Gaming leaderboards, IoT telemetry, session management, shopping carts, real-time bidding.</p><h3>Amazon DocumentDB (MongoDB Compatible)</h3><p>If your team already uses <strong>MongoDB</strong>, DocumentDB provides a managed, MongoDB-compatible document database.</p><p><strong>When to use DocumentDB:</strong></p><ul><li><p>You&#8217;re migrating from MongoDB.</p></li><li><p>You need flexible JSON document storage with rich query capabilities.</p></li><li><p>You want managed infrastructure without running your own MongoDB clusters.</p></li></ul><h3>Amazon ElastiCache (Redis &amp; Memcached)</h3><p>ElastiCache provides <strong>in-memory key-value stores</strong> for caching and real-time workloads.</p><p><strong>When to use ElastiCache:</strong></p><ul><li><p>You need <strong>sub-millisecond</strong> response times.</p></li><li><p>You&#8217;re caching frequently accessed data from another database.</p></li><li><p>You need real-time features like session stores, pub/sub messaging, or leaderboards.</p></li></ul><h3>Amazon Neptune</h3><p>Neptune is a <strong>graph database</strong> for highly connected datasets.</p><p><strong>When to use Neptune:</strong></p><ul><li><p>Social networks, recommendation engines, fraud detection.</p></li><li><p>Your queries involve traversing complex relationships between entities.</p></li></ul><h2><strong>Head-to-Head Comparison</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MPXz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MPXz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 424w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 848w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1272w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MPXz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png" width="1456" height="786" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:226962,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/188352806?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MPXz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 424w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 848w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1272w, https://substackcdn.com/image/fetch/$s_!MPXz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febd580c1-7881-4c15-bc6f-fc530431f447_1634x882.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Decision Framework: 7 Questions to Ask Yourself</h2><p>Use these questions to guide your choice:</p><h3>1. How relational is your data?</h3><p>Highly relational (many joins, foreign keys, normalized tables) &#8594; SQL<br>Loosely structured or denormalized &#8594; NoSQL</p><h3>2. How complex are your queries?</h3><p>Need ad-hoc queries, complex reporting, aggregations &#8594; SQL<br>Known, simple access patterns (get by key, query by index) &#8594; NoSQL</p><h3>3. What&#8217;s your scale?</h3><p>Moderate, predictable traffic &#8594; SQL (RDS/Aurora) handles this well<br>Massive, spiky, or unpredictable traffic &#8594; NoSQL (DynamoDB) scales seamlessly</p><h3>4. What are your latency requirements?</h3><p>Single-digit millisecond &#8594; Both work<br>Microsecond &#8594; DynamoDB + DAX or ElastiCache</p><h3>5. Do you need ACID transactions?</h3><p>Complex multi-table transactions &#8594; SQL<br>Simple transactions &#8594; DynamoDB Transactions may be sufficient</p><h3>6. Are you building serverless?</h3><p>DynamoDB integrates natively with Lambda, API Gateway, and EventBridge.<br>RDS in serverless architectures can be problematic (connection limits), though Aurora Serverless v2 and RDS Proxy help.</p><h3>7. What&#8217;s your budget model preference?</h3><p>Predictable monthly cost &#8594; RDS (reserved instances)<br>Pay-per-use &#8594; DynamoDB on-demand mode</p><h2>Real-World Scenarios</h2><h3>Scenario 1: E-Commerce Platform</h3><p><strong>Requirements:</strong> Product catalog, orders, inventory, payments.</p><p><strong>Recommendation: Aurora PostgreSQL + DynamoDB (hybrid)</strong></p><ul><li><p>Use <strong>Aurora</strong> for orders and payments (ACID transactions, complex queries).</p></li><li><p>Use <strong>DynamoDB</strong> for the product catalog and shopping cart (high read throughput, flexible attributes).</p></li><li><p>Use <strong>ElastiCache Redis</strong> for session management and caching.</p></li></ul><h3>Scenario 2: Real-Time Gaming Backend</h3><p><strong>Requirements:</strong> Leaderboards, player profiles, matchmaking, millions of concurrent users.</p><p><strong>Recommendation: DynamoDB</strong></p><ul><li><p>Scales horizontally to handle millions of players.</p></li><li><p>Single-digit millisecond latency for leaderboard reads/writes.</p></li><li><p>Global Tables for multi-region play.</p></li></ul><h3>Scenario 3: Financial Reporting System</h3><p><strong>Requirements:</strong> Complex queries, regulatory compliance, audit trails, historical analysis.</p><p><strong>Recommendation: Aurora PostgreSQL</strong></p><ul><li><p>Full ACID compliance for transaction integrity.</p></li><li><p>Complex SQL queries for financial reporting.</p></li><li><p>Point-in-time recovery for audit requirements.</p></li></ul><h3>Scenario 4: IoT Data Platform</h3><p><strong>Requirements:</strong> Millions of devices sending telemetry every second.</p><p><strong>Recommendation: DynamoDB + Amazon Timestream</strong></p><ul><li><p><strong>DynamoDB</strong> for device state and metadata.</p></li><li><p><strong>Timestream</strong> (purpose-built time-series DB) for telemetry data.</p></li></ul><h2>The Hybrid Approach: Why Not Both?</h2><p>In practice, many production systems on AWS use <strong>both SQL and NoSQL</strong> databases. This is called <strong>polyglot persistence</strong>, using the best database for each specific job.</p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;   Frontend   &#9474;&#9472;&#9472;&#9472;&#9472;&#9654;&#9474;   API Gateway +  &#9474;&#9472;&#9472;&#9472;&#9472;&#9654;&#9474;  DynamoDB    &#9474;
&#9474;   (React)    &#9474;     &#9474;   Lambda         &#9474;     &#9474;  (Catalog,   &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;     &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;     &#9474;   Sessions)  &#9474;
                            &#9474;                 &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                            &#9474;
                            &#9660;
                     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;     &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
                     &#9474;   ECS / EKS      &#9474;&#9472;&#9472;&#9472;&#9472;&#9654;&#9474;  Aurora       &#9474;
                     &#9474;   (Order Service)&#9474;     &#9474;  (Orders,    &#9474;
                     &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;     &#9474;   Payments)  &#9474;
                                              &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Tips for a hybrid approach:</strong></p><ul><li><p>Use <strong>DynamoDB</strong> for high-throughput, simple-access workloads.</p></li><li><p>Use <strong>Aurora</strong> for transactional and reporting workloads.</p></li><li><p>Use <strong>ElastiCache</strong> as a caching layer in front of either.</p></li><li><p>Use <strong>Amazon EventBridge</strong> or <strong>DynamoDB Streams</strong> to sync data between databases when needed.</p></li></ul><h2>Final Thoughts</h2><p>There&#8217;s no universal &#8220;best&#8221; database, only the <strong>best database for your specific use case</strong>. The most important thing is to <strong>start with your access patterns and requirements</strong>, not with the technology.</p><p>Ask yourself:</p><ul><li><p><em>What does my data look like?</em></p></li><li><p><em>How will I query it?</em></p></li><li><p><em>How much will it scale?</em></p></li><li><p><em>What consistency guarantees do I need?</em></p></li></ul><p>Let the answers guide you to the right choice. And remember, on AWS, you&#8217;re never locked into one paradigm. Embrace polyglot persistence and use the right tool for each job.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[AWS Networking 101: Everything You Need to Know About VPCs]]></title><description><![CDATA[If you&#8217;re starting your AWS journey, you&#8217;ve probably heard about VPC (Virtual Private Cloud) and wondered what all the fuss is about.]]></description><link>https://blog.thecloudengineers.com/p/aws-networking-101-everything-you</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/aws-networking-101-everything-you</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 11 Feb 2026 10:30:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/850dc490-8acc-4099-8af5-ee55c615f423_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you&#8217;re starting your AWS journey, you&#8217;ve probably heard about VPC (Virtual Private Cloud) and wondered what all the fuss is about. Maybe you&#8217;ve seen network diagrams that look like spaghetti, or you&#8217;ve tried to follow documentation that assumes you already know what a subnet is. Perhaps you&#8217;ve even accidentally launched resources into the default VPC without fully understanding what was happening behind the scenes, and that&#8217;s completely normal.</p><p>Here&#8217;s the truth: VPCs are simpler than they seem. Once you understand the core concepts and how they fit together, everything else clicks into place. The terminology can feel overwhelming at first, CIDR blocks, route tables, gateways, but each of these is just a small, logical piece of a bigger puzzle. This guide will walk you through VPCs step-by-step, with visual explanations that make sense. By the end, you won&#8217;t just understand VPCs, you&#8217;ll understand <em>why</em> they&#8217;re designed the way they are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>What Is a VPC, Really?</strong></h2><p>Think of a VPC as your own private section of the AWS cloud. It&#8217;s like having your own data center, but virtual. Inside this space, you can launch AWS resources (like EC2 instances, databases, and Lambda functions) and control exactly how they communicate with each other and the outside world.</p><p>To make this even more concrete, imagine you&#8217;re renting an entire floor in a massive office building. The building is AWS&#8217;s cloud infrastructure, shared among millions of tenants. But your floor? That&#8217;s entirely yours. You decide how to partition the rooms, who gets access to what, which doors lead outside, and which rooms are locked away from public view. Other tenants can&#8217;t wander into your space, and you can&#8217;t wander into theirs, unless you both explicitly agree to connect your floors together.</p><p>Every AWS account comes with a <strong>default VPC</strong> in each region, pre-configured so you can start launching resources immediately. However, for production workloads, you&#8217;ll almost always want to create a <strong>custom VPC</strong> where you have full control over the network design. The default VPC is convenient for experimentation, but it makes assumptions about your architecture that may not align with your security or organizational needs.</p><p><strong>The key benefit:</strong> Complete control over your network environment, IP addresses, subnets, route tables, and network gateways. You decide what&#8217;s exposed, what&#8217;s hidden, and how every piece communicates.</p><h2><strong>The Building Blocks</strong></h2><p>Let&#8217;s break down the essential components you need to understand:</p><h3><strong>1. VPC (Virtual Private Cloud)</strong></h3><p><strong>What it is:</strong> The container for your entire network. It&#8217;s the foundational layer upon which everything else is built.</p><p><strong>Key details:</strong></p><ul><li><p>You define an IP address range using <strong>CIDR notation</strong> (e.g., <code>10.0.0.0/16</code>). CIDR stands for Classless Inter-Domain Routing, and it&#8217;s simply a compact way of expressing a range of IP addresses. The <code>/16</code> part is called the prefix length and determines how many IP addresses are available within the range. A smaller number after the slash means a larger network, <code>/16</code> gives you 65,536 addresses, while <code>/24</code> gives you only 256.</p></li><li><p>Choosing your CIDR block is one of the most important early decisions because <strong>you cannot change it after creation</strong> (though you can add secondary CIDR blocks later). If you choose a range that&#8217;s too small, you&#8217;ll run out of IP addresses as your infrastructure grows. If you choose one that overlaps with another VPC or your on-premises network, you&#8217;ll run into conflicts when you try to connect them later.</p></li><li><p>Each VPC is <strong>isolated from other VPCs by default</strong>. This is a fundamental security property. Even if two VPCs use the exact same CIDR block, they cannot communicate unless you explicitly set up a connection between them (using VPC Peering or a Transit Gateway). This isolation means a misconfiguration in one VPC won&#8217;t accidentally leak traffic into another.</p></li><li><p>A VPC exists within a single <strong>AWS Region</strong> but spans all of that region&#8217;s Availability Zones. This is important: your VPC isn&#8217;t confined to a single data center. It&#8217;s a regional construct that gives you the ability to distribute resources across multiple physically separated facilities.</p></li></ul><p><strong>Visual concept:</strong></p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;         VPC (10.0.0.0/16)         &#9474;
&#9474;                                   &#9474;
&#9474;  [Your AWS resources live here]   &#9474;
&#9474;                                   &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><h3><strong>2. Subnets</strong></h3><p><strong>What they are:</strong> Subdivisions of your VPC&#8217;s IP address range. If the VPC is your floor in the office building, subnets are the individual rooms you create within it.</p><p><strong>Why you need them:</strong> To organize resources and control access to the internet. Subnets give you the ability to apply different network rules and routing behaviors to different groups of resources. Without subnets, every resource in your VPC would have to follow the same networking rules, and that&#8217;s not practical when some resources need public exposure and others need to stay hidden.</p><p><strong>Two types:</strong></p><ul><li><p><strong>Public subnets:</strong> Resources here can communicate directly with the internet. These are for things that need to be reachable by users or external systems, web servers, load balancers, bastion hosts, and similar front-facing infrastructure.</p></li><li><p><strong>Private subnets:</strong> Resources here cannot directly access the internet and, crucially, cannot be directly reached from the internet. This makes them far more secure. These are ideal for databases, application servers, internal microservices, caches, and anything that doesn&#8217;t need a public-facing presence.</p></li></ul><p>It&#8217;s important to understand that <strong>there&#8217;s nothing inherently &#8220;public&#8221; or &#8220;private&#8221; about a subnet itself</strong>. What makes a subnet public or private is its <strong>route table configuration</strong>, specifically, whether its route table includes a route to an Internet Gateway. This is a common source of confusion for beginners who think the distinction is set at creation time.</p><p>Each subnet lives within a <strong>single Availability Zone</strong> and cannot span multiple AZs. This is by design, it allows AWS to provide strong isolation guarantees. If one AZ experiences issues, subnets in other AZs remain unaffected. This is why best practice dictates creating matching subnets across multiple Availability Zones, so your application can survive the loss of an entire data center.</p><p>Each subnet also has a small number of <strong>reserved IP addresses</strong> that AWS uses for internal purposes. In every subnet, the first four IP addresses and the last IP address are reserved. For a <code>/24</code> subnet (256 addresses), that means 251 are actually usable. This is a small detail, but it matters when you&#8217;re planning tightly-sized subnets.</p><p><strong>Visual concept:</strong></p><pre><code><code>&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;              VPC (10.0.0.0/16)               &#9474;
&#9474;                                              &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;  &#9474;
&#9474;  &#9474;  Public Subnet   &#9474;  &#9474;  Private Subnet  &#9474;  &#9474;
&#9474;  &#9474;  10.0.1.0/24     &#9474;  &#9474;  10.0.2.0/24     &#9474;  &#9474;
&#9474;  &#9474;                  &#9474;  &#9474;                  &#9474;  &#9474;
&#9474;  &#9474;  [Web Servers]   &#9474;  &#9474;  [Databases]     &#9474;  &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;  &#9474;
&#9474;                                              &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Best practice:</strong> Create subnets in multiple Availability Zones for high availability. A common pattern is to have at least two public subnets and two private subnets, each pair spread across two different AZs. Some organizations go further and use three AZs for even greater resilience.</p><h3><strong>3. Internet Gateway (IGW)</strong></h3><p><strong>What it is:</strong> The door between your VPC and the public internet. It&#8217;s a horizontally scaled, redundant, and highly available component managed entirely by AWS.</p><p><strong>What it does:</strong> Allows resources in public subnets to communicate with the internet, both sending requests out and receiving requests in. Without an Internet Gateway, your VPC is a completely sealed-off network with no path to the outside world.</p><p>The Internet Gateway serves two critical functions. First, it acts as a <strong>target in your route table</strong> for internet-bound traffic, giving your resources a path to follow when they need to reach an external address. Second, it performs <strong>Network Address Translation (NAT)</strong> for instances that have been assigned a public IPv4 address. When traffic leaves your instance, the IGW replaces the private IP with the public IP; when responses come back, it reverses the translation. This is why simply attaching an IGW isn&#8217;t enough, your instances also need public IP addresses (or Elastic IPs) and the correct route table entries to actually use it.</p><p><strong>Visual concept:</strong></p><pre><code><code>                    Internet
                       &#8597;
              &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
              &#9474; Internet       &#9474;
              &#9474; Gateway (IGW)  &#9474;
              &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                       &#8597;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;           VPC (10.0.0.0/16)        &#9474;
&#9474;                                    &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;                &#9474;
&#9474;  &#9474; Public Subnet  &#9474;                &#9474;
&#9474;  &#9474; 10.0.1.0/24    &#9474;                &#9474;
&#9474;  &#9474;                &#9474;                &#9474;
&#9474;  &#9474; [Web Server]   &#9474;                &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;                &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Key point:</strong> You attach <strong>one IGW per VPC</strong>. You cannot attach multiple Internet Gateways to the same VPC, and you cannot share an IGW across VPCs. It&#8217;s a one-to-one relationship. Without it, nothing in your VPC can reach the internet, and nothing on the internet can reach your VPC. It&#8217;s also worth noting that the IGW itself doesn&#8217;t impose bandwidth limits or become a bottleneck, AWS manages its scaling transparently.</p><h3><strong>4. Route Tables</strong></h3><p><strong>What they are:</strong> Sets of rules (called routes) that determine where network traffic is directed. Every single packet of data that moves through your VPC is guided by a route table.</p><p><strong>How they work:</strong> Each subnet is associated with exactly one route table, and that route table tells traffic where to go based on its destination. When a resource in your subnet sends a packet, the VPC examines the destination IP address, looks it up in the route table, and forwards the packet to the matching target. If there are multiple matching routes, the most specific route (the one with the longest prefix) wins.</p><p>Every VPC comes with a <strong>main route table</strong> automatically. Any subnet that isn&#8217;t explicitly associated with a custom route table will use the main route table by default. This is a subtle but important point, it means newly created subnets silently inherit the main route table&#8217;s behavior. For this reason, many practitioners recommend keeping the main route table restrictive (private) and explicitly assigning custom route tables with internet access only to the subnets that need it. This way, if someone creates a new subnet and forgets to assign a route table, it defaults to private rather than accidentally becoming public.</p><p><strong>Example routes:</strong></p><ul><li><p><code>10.0.0.0/16 &#8594; local</code> : Traffic destined for any IP within the VPC stays inside the VPC. This route is automatically present in every route table and <strong>cannot be removed</strong>. It&#8217;s what allows your resources to communicate with each other across subnets.</p></li><li><p><code>0.0.0.0/0 &#8594; igw-xxxxx</code> : All other traffic (anything not matching a more specific route) goes to the Internet Gateway. This is the route that makes a subnet &#8220;public.&#8221; The <code>0.0.0.0/0</code> destination is a catch-all, often called the <strong>default route</strong>.</p></li></ul><p><strong>Visual concept:</strong></p><pre><code><code>Public Subnet Route Table:
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474; Destination     &#9474; Target       &#9474;
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474; 10.0.0.0/16     &#9474; local        &#9474;
&#9474; 0.0.0.0/0       &#9474; igw-xxxxx    &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;

Private Subnet Route Table:
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9516;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474; Destination     &#9474; Target       &#9474;
&#9500;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9532;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9508;
&#9474; 10.0.0.0/16     &#9474; local        &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9524;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>The difference:</strong> Public subnets have a route to the IGW; private subnets don&#8217;t. That single route entry is literally the only thing that separates a &#8220;public&#8221; subnet from a &#8220;private&#8221; one. Route tables can also contain routes pointing to other targets like NAT Gateways, VPC Peering connections, Transit Gateways, VPN connections, and VPC endpoints, which is how more complex architectures are built.</p><h3><strong>5. NAT Gateway</strong></h3><p><strong>What it is:</strong> A managed service that allows resources in private subnets to initiate outbound connections to the internet (for software updates, API calls, downloading patches, etc.) <strong>without</strong> exposing them to inbound internet traffic. It performs Network Address Translation, replacing the private IP address of the originating resource with its own public IP address before forwarding the request to the internet.</p><p><strong>Why you need it:</strong> Your database in a private subnet needs to download security patches, your application server needs to call a third-party API, or your Lambda function in a private subnet needs to reach an external webhook. But you don&#8217;t want any of these resources to be directly reachable from the internet. The NAT Gateway solves this by acting as a one-way intermediary, it allows traffic <strong>out</strong> but blocks unsolicited traffic <strong>in</strong>.</p><p><strong>Visual concept:</strong></p><pre><code><code>                    Internet
                       &#8597;
              &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
              &#9474; Internet       &#9474;
              &#9474; Gateway (IGW)  &#9474;
              &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                       &#8597;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;              VPC (10.0.0.0/16)               &#9474;
&#9474;                                              &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;    &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Public Subnet  &#9474;    &#9474; Private Subnet  &#9474;   &#9474;
&#9474;  &#9474; 10.0.1.0/24    &#9474;    &#9474; 10.0.2.0/24     &#9474;   &#9474;
&#9474;  &#9474;                &#9474;    &#9474;                 &#9474;   &#9474;
&#9474;  &#9474; [NAT Gateway]  &#9474;&#8592;&#9472;&#9472;&#9472;&#9474; [Database]      &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;    &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;                                              &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>How it works:</strong></p><ol><li><p>The database initiates an outbound connection (e.g., to download a security patch).</p></li><li><p>The private subnet&#8217;s route table sends internet-bound traffic (<code>0.0.0.0/0</code>) to the NAT Gateway in the public subnet.</p></li><li><p>The NAT Gateway replaces the database&#8217;s private IP address with its own public Elastic IP address and forwards the traffic to the Internet Gateway.</p></li><li><p>The IGW sends the request out to the internet.</p></li><li><p>The response comes back through the same path in reverse, IGW to NAT Gateway, which translates the address back and forwards the response to the database.</p></li></ol><p>The key security property here is that while the database can <strong>reach out</strong> to the internet, no one on the internet can <strong>initiate</strong> a connection to the database. The NAT Gateway will simply drop any unsolicited inbound traffic.</p><p><strong>Important architectural consideration:</strong> A NAT Gateway is deployed into a <strong>specific subnet</strong> within a <strong>specific Availability Zone</strong>. If that AZ goes down, resources in private subnets in other AZs that depend on that NAT Gateway will lose their internet access. This is why production architectures deploy a NAT Gateway in each AZ and configure each private subnet to route through the NAT Gateway in its own AZ. This adds cost but ensures that an AZ failure doesn&#8217;t cascade into a networking failure for your private resources.</p><p><strong>Cost note:</strong> NAT Gateways cost approximately $0.045/hour (about $32/month) plus data processing charges of $0.045 per GB. In a multi-AZ setup with two NAT Gateways, that&#8217;s roughly $64/month before any data transfer. For learning environments or development workloads, you can use a <strong>NAT Instance</strong> (a regular EC2 instance configured to perform NAT) to save money, though it won&#8217;t offer the same availability, bandwidth, or managed maintenance that a NAT Gateway provides. Some teams also explore <strong>VPC endpoints</strong> as a way to avoid NAT Gateway costs entirely for traffic destined to supported AWS services like S3 or DynamoDB.</p><h3><strong>6. Security Groups</strong></h3><p><strong>What they are:</strong> Virtual firewalls that control inbound and outbound traffic at the <strong>instance level</strong> (more precisely, at the network interface level). Every EC2 instance, RDS database, Lambda function in a VPC, and many other AWS resources are associated with one or more Security Groups.</p><p><strong>How they work:</strong> You define rules that <strong>allow</strong> specific traffic. This is an important distinction: Security Groups are <strong>allow-only</strong>. You cannot write a rule that explicitly denies traffic. If traffic doesn&#8217;t match any allow rule, it is implicitly denied. This makes Security Groups relatively straightforward to reason about, if it&#8217;s not explicitly permitted, it&#8217;s blocked.</p><p><strong>Example rules:</strong></p><pre><code><code>Web Server Security Group:
Inbound:
- Allow HTTP (port 80) from 0.0.0.0/0 (anywhere)
- Allow HTTPS (port 443) from 0.0.0.0/0
- Allow SSH (port 22) from your IP only

Outbound:
- Allow all traffic (default)</code></code></pre><pre><code><code>Database Security Group:
Inbound:
- Allow MySQL (port 3306) from Web Server Security Group only

Outbound:
- Allow all traffic</code></code></pre><p>One of the most powerful features of Security Groups is the ability to <strong>reference other Security Groups</strong> as sources or destinations in your rules. In the example above, the database Security Group doesn&#8217;t allow MySQL traffic from a specific IP address, it allows it from any resource associated with the Web Server Security Group. This means if you add a new web server and attach the same Security Group, it automatically gains database access without any rule changes. This creates a dynamic, self-maintaining security model that scales with your infrastructure.</p><p><strong>Key concept:</strong> Security Groups are <strong>stateful</strong>, if you allow inbound traffic on a port, the corresponding response traffic is automatically allowed outbound, regardless of the outbound rules. Likewise, if an instance initiates an outbound connection, the response is automatically allowed inbound. This means you don&#8217;t need to worry about creating matching rules for both directions of a conversation, which significantly reduces the complexity of your firewall configuration.</p><p>Another important behavior: Security Groups <strong>evaluate all rules before making a decision</strong>. Unlike NACLs (which process rules in order and stop at the first match), Security Groups aggregate all rules and allow traffic if <strong>any</strong> rule permits it. This means you never have to worry about rule ordering within a Security Group.</p><h3><strong>7. Network ACLs (NACLs)</strong></h3><p><strong>What they are:</strong> An additional layer of security that operates at the <strong>subnet level</strong>. While Security Groups wrap around individual resources, NACLs wrap around entire subnets, filtering traffic as it enters and exits the subnet boundary.</p><p><strong>Difference from Security Groups:</strong></p><ul><li><p><strong>NACLs are stateless</strong>, you must explicitly write rules for both inbound <strong>and</strong> outbound traffic. If you allow inbound HTTP traffic on port 80, you must also create an outbound rule allowing the response traffic (which typically uses an ephemeral port in the range 1024&#8211;65535). Forgetting the outbound rule is one of the most common NACL mistakes, and it results in traffic appearing to be silently dropped.</p></li><li><p><strong>NACLs support both allow and deny rules</strong>, unlike Security Groups which only support allow rules. This makes NACLs useful for explicitly blocking specific IP addresses or ranges, for example, blocking a known malicious IP that&#8217;s attempting to probe your infrastructure.</p></li><li><p><strong>NACLs process rules in order</strong>, starting from the lowest numbered rule and stopping at the first match. This means rule ordering matters significantly. A common pattern is to use low-numbered rules (like 100, 200, 300) for your specific allow/deny rules, leaving gaps so you can insert new rules later without renumbering everything.</p></li><li><p><strong>NACLs apply to all resources within a subnet</strong> automatically. You don&#8217;t attach a NACL to an individual instance, every resource in the subnet is subject to the NACL&#8217;s rules, with no exceptions.</p></li><li><p><strong>Each subnet must be associated with exactly one NACL.</strong> If you don&#8217;t explicitly assign one, the subnet uses the VPC&#8217;s <strong>default NACL</strong>, which allows all inbound and outbound traffic. Custom NACLs, by contrast, <strong>deny all traffic by default</strong> until you add rules, another common gotcha for beginners.</p></li></ul><p>Think of NACLs and Security Groups as two concentric walls of defense. Traffic entering a subnet first passes through the NACL. If the NACL allows it, the traffic then reaches the Security Group attached to the target resource. Both must permit the traffic for it to get through. This is an example of <strong>defense in depth</strong>, even if one layer is misconfigured, the other layer can still provide protection.</p><p><strong>When to use them:</strong> Most beginners can stick with Security Groups for day-to-day access control. NACLs are most useful when you need subnet-wide rules, when you need to explicitly <strong>deny</strong> traffic from specific sources, or when your organization&#8217;s compliance requirements mandate multiple layers of network filtering. In many production environments, NACLs are kept relatively permissive while Security Groups do the heavy lifting of fine-grained access control.</p><h2><strong>Putting It All Together: A Complete VPC Architecture</strong></h2><p>Here&#8217;s a production-ready VPC setup:</p><pre><code><code>                         Internet
                            &#8597;
                   &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
                   &#9474; Internet       &#9474;
                   &#9474; Gateway (IGW)  &#9474;
                   &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;
                            &#8597;
&#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;
&#9474;                  VPC (10.0.0.0/16)                  &#9474;
&#9474;                                                     &#9474;
&#9474;  Availability Zone A          Availability Zone B   &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;        &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Public Subnet    &#9474;        &#9474; Public Subnet    &#9474;   &#9474;
&#9474;  &#9474; 10.0.1.0/24      &#9474;        &#9474; 10.0.3.0/24      &#9474;   &#9474;
&#9474;  &#9474;                  &#9474;        &#9474;                  &#9474;   &#9474;
&#9474;  &#9474; [Web Server A]   &#9474;        &#9474; [Web Server B]   &#9474;   &#9474;
&#9474;  &#9474; [NAT Gateway A]  &#9474;        &#9474; [NAT Gateway B]  &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;        &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;          &#8597;                           &#8597;              &#9474;
&#9474;  &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;        &#9484;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9488;   &#9474;
&#9474;  &#9474; Private Subnet   &#9474;        &#9474; Private Subnet   &#9474;   &#9474;
&#9474;  &#9474; 10.0.2.0/24      &#9474;        &#9474; 10.0.4.0/24      &#9474;   &#9474;
&#9474;  &#9474;                  &#9474;        &#9474;                  &#9474;   &#9474;
&#9474;  &#9474; [Database A]     &#9474;        &#9474; [Database B]     &#9474;   &#9474;
&#9474;  &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;        &#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;   &#9474;
&#9474;                                                     &#9474;
&#9492;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9472;&#9496;</code></code></pre><p><strong>Why this architecture works:</strong></p><ul><li><p><strong>High availability:</strong> Resources are distributed across multiple Availability Zones. If AZ-A experiences a hardware failure, a power outage, or a network issue, AZ-B continues to serve traffic. Your application stays online because it doesn&#8217;t depend on any single physical location.</p></li><li><p><strong>Security:</strong> Databases sit in private subnets with no direct internet access. The only way to reach them is through the web servers (controlled by Security Groups) or via internal networking. This dramatically reduces the attack surface for your most sensitive data.</p></li><li><p><strong>Internet access:</strong> NAT Gateways in each AZ ensure that private resources can still reach the internet for necessary outbound tasks, operating system patches, third-party API calls, package downloads, without compromising their isolation from inbound internet traffic. Each private subnet routes through the NAT Gateway in its own AZ, so an AZ failure doesn&#8217;t cut off internet access for the other AZ&#8217;s private resources.</p></li><li><p><strong>Scalability:</strong> An Application Load Balancer (which would sit in the public subnets) distributes incoming user traffic across web servers in both AZs. As demand grows, you can add more instances behind the load balancer, or use Auto Scaling Groups to add and remove them automatically based on metrics like CPU utilization or request count.</p></li></ul><p>This architecture is often extended with additional subnet tiers. Many organizations add a <strong>third tier</strong>, sometimes called an &#8220;isolated&#8221; or &#8220;data&#8221; tier, for resources that should have no internet access at all, not even outbound. You might also see dedicated subnets for things like Elasticache clusters, internal load balancers, or container orchestration infrastructure. The pattern stays the same; you&#8217;re just adding more rooms to the floor plan.</p><h2><strong>Conclusion</strong></h2><p>VPCs don&#8217;t have to be intimidating. At their core, they&#8217;re built from a handful of straightforward concepts: a <strong>VPC</strong> gives you an isolated network, <strong>subnets</strong> help you organize and secure your resources, an <strong>Internet Gateway</strong> connects you to the outside world, <strong>route tables</strong> direct traffic, <strong>NAT Gateways</strong> let private resources reach the internet safely, and <strong>Security Groups</strong> and <strong>NACLs</strong> act as your gatekeepers.</p><p>Once you understand how these pieces snap together, you&#8217;re not just following tutorials, you&#8217;re making informed decisions about how your infrastructure should work. You&#8217;ll understand <em>why</em> a database should live in a private subnet, <em>why</em> a route table needs a specific entry, and <em>why</em> Security Groups reference each other instead of hard-coded IP addresses.</p><p><strong>Here&#8217;s what I&#8217;d recommend as your next steps:</strong></p><ul><li><p><strong>Get hands-on.</strong> Create a VPC from scratch in the AWS Console (not the default one). Set up a public subnet, a private subnet, an Internet Gateway, and a route table. Launch an EC2 instance in the public subnet and verify you can SSH into it. Then launch one in the private subnet and observe that you can&#8217;t. Break things on purpose, remove the route to the IGW and see what happens. Remove the Security Group rule for SSH and watch the connection fail. These small experiments will teach you more than reading ever could.</p></li><li><p><strong>Start simple.</strong> You don&#8217;t need multi-AZ NAT Gateways on day one. Build the basic architecture first, one public subnet, one private subnet, one AZ. Get comfortable with how traffic flows. Then layer in high availability by duplicating the setup across a second AZ. Add a load balancer. Add a NAT Gateway. Each addition will make more sense because you understand the foundation it&#8217;s building on.</p></li><li><p><strong>Explore Infrastructure as Code.</strong> Once you&#8217;re comfortable building VPCs manually, try recreating the same setup with <strong>CloudFormation</strong> or <strong>Terraform</strong>. Defining your VPC in code forces you to understand every dependency and every relationship between components, you can&#8217;t click &#8220;next&#8221; and let the console fill in defaults. It&#8217;ll solidify your understanding and make your infrastructure repeatable, version-controlled, and shareable with your team.</p></li><li><p><strong>Learn about VPC connectivity options.</strong> Once your single-VPC architecture is solid, start exploring how VPCs connect to each other and to the outside world. <strong>VPC Peering</strong> lets two VPCs communicate directly. <strong>Transit Gateway</strong> simplifies networking when you have many VPCs. <strong>VPN connections</strong> and <strong>AWS Direct Connect</strong> link your VPC to on-premises data centers. <strong>VPC Endpoints</strong> let your private resources access AWS services like S3 and DynamoDB without traversing the internet at all. Each of these builds directly on the concepts you&#8217;ve learned here.</p></li></ul><p>The production-ready architecture we walked through above isn&#8217;t just a diagram, it&#8217;s the foundation that most real-world AWS applications are built on. Every load balancer, container cluster, and serverless application you&#8217;ll work with in the future sits inside a VPC. By understanding these fundamentals now, you&#8217;re setting yourself up to build cloud infrastructure that&#8217;s secure, scalable, and resilient from the start.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Building a Containerized Application with ECS and Fargate]]></title><description><![CDATA[Running containers in production can feel overwhelming when you first approach AWS.]]></description><link>https://blog.thecloudengineers.com/p/building-a-containerized-application</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/building-a-containerized-application</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 04 Feb 2026 10:30:21 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/41e07d5c-9502-4385-8cd6-ae90c9240aa2_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Running containers in production can feel overwhelming when you first approach AWS. You hear terms like ECS, EKS, Fargate, EC2 launch types, task definitions, and services. It gets confusing fast.</p><p>This article walks you through the complete flow of deploying a containerized application using Amazon ECS with Fargate. No servers to manage. No cluster capacity to worry about. Just your application running in containers.</p><p>By the end, you will understand each component and how they connect together.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>What Are ECS and Fargate?</h2><h3>Amazon ECS</h3><p>Elastic Container Service (ECS) is a container orchestration platform from AWS. It handles the scheduling, deployment, and management of your containers. Think of it as the brain that decides where and how your containers run.</p><h3>AWS Fargate</h3><p>Fargate is a serverless compute engine for containers. Instead of provisioning EC2 instances and managing their capacity, you simply tell AWS how much CPU and memory your container needs. Fargate handles the underlying infrastructure.</p><p>When you combine ECS with Fargate, you get container orchestration without server management.</p><h2>Architecture Overview</h2><p>Before diving into the steps, let us look at the components involved.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SXZ9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SXZ9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 424w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 848w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1272w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png" width="1456" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136151,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/186718635?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SXZ9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 424w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 848w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1272w, https://substackcdn.com/image/fetch/$s_!SXZ9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f3de7c6-4eee-4988-a6eb-45908092f7d8_1750x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The Complete Flow</h2><h3>Step 1: Prepare Your Application for Containerization</h3><p>Start with your application. Whether it is a web API, a backend service, or a worker process, you need to package it into a container image.</p><p>Create a Dockerfile in your project root. This file contains instructions for building your container image. Define the base image, copy your application files, install dependencies, and specify the command to run your application.</p><p>Keep images small. Use multi-stage builds if needed. Smaller images mean faster deployments and lower costs.</p><h3>Step 2: Create an ECR Repository</h3><p>Amazon Elastic Container Registry stores your container images. Before pushing images, you need a repository.</p><p>Navigate to the ECR console and create a new private repository. Choose a meaningful name that reflects your application. Enable image scanning to automatically check for vulnerabilities when images are pushed.</p><p>Configure lifecycle policies to automatically clean up old images. This prevents storage costs from growing indefinitely.</p><h3>Step 3: Build and Push Your Image</h3><p>Build your container image locally using Docker. Tag the image with your ECR repository URI. The tag should include a version identifier such as a git commit hash or semantic version number.</p><p>Authenticate Docker with ECR using the AWS CLI. This generates temporary credentials that allow Docker to push images.</p><p>Push the tagged image to your ECR repository. Verify the push succeeded by checking the repository in the console.</p><h3>Step 4: Set Up the Network Infrastructure</h3><p>Your containers need a network to run in. Create a VPC or use an existing one.</p><p>Design your subnets carefully. Place your Fargate tasks in private subnets. They do not need direct internet access. Place the Application Load Balancer in public subnets so it can receive traffic from the internet.</p><p>Create a NAT Gateway in a public subnet. Your tasks in private subnets need this to pull images from ECR and access other AWS services.</p><p>Configure security groups. Create one for the load balancer that allows inbound traffic on your application port. Create another for the Fargate tasks that only allows traffic from the load balancer security group.</p><h3>Step 5: Create IAM Roles</h3><p>ECS requires two types of IAM roles.</p><p>The task execution role allows ECS to pull images from ECR, send logs to CloudWatch, and retrieve secrets from Secrets Manager or Parameter Store. AWS provides a managed policy called <code>AmazonECSTaskExecutionRolePolicy</code> that covers most use cases.</p><p>The task role is what your application uses at runtime. Attach policies based on what AWS services your application needs to access. If your app reads from S3, add S3 read permissions. If it writes to DynamoDB, add those permissions. Follow the principle of least privilege.</p><h3>Step 6: Create an ECS Cluster</h3><p>The cluster is a logical grouping for your services and tasks. Navigate to the ECS console and create a new cluster.</p><p>Give it a descriptive name. Enable Container Insights if you want detailed metrics and logs. This adds cost but provides valuable observability.</p><p>With Fargate, the cluster is essentially just a namespace. There are no EC2 instances to manage.</p><h3>Step 7: Define Your Task Definition</h3><p>The task definition is a blueprint for your container. It describes how your container should run.</p><p>Specify the following:</p><p><strong>Family name</strong>: A unique identifier for your task definition. New revisions will share this name.</p><p><strong>Launch type</strong>: Select Fargate.</p><p><strong>CPU and memory</strong>: Choose appropriate values based on your application requirements. Fargate has specific combinations available.</p><p><strong>Container definition</strong>: Provide the ECR image URI, container port mappings, environment variables, and log configuration.</p><p><strong>Networking mode</strong>: Use awsvpc mode. This is required for Fargate and gives each task its own elastic network interface.</p><p><strong>Log configuration</strong>: Send logs to CloudWatch Logs. Create a log group beforehand or let ECS create one automatically.</p><h3>Step 8: Create an Application Load Balancer</h3><p>The load balancer distributes traffic across your tasks and provides a stable endpoint.</p><p>Create an Application Load Balancer in your public subnets. Configure listeners for HTTP on port 80 or HTTPS on port 443. For production, always use HTTPS with an ACM certificate.</p><p>Create a target group with the target type set to IP. This is required for Fargate. Configure health checks to verify your application is responding correctly.</p><p>Set the health check path to an endpoint your application exposes. Configure appropriate thresholds and intervals based on how quickly your application starts.</p><h3>Step 9: Create the ECS Service</h3><p>The service maintains the desired number of tasks and integrates with the load balancer.</p><p>Navigate to your cluster and create a new service. Select the task definition you created earlier.</p><p>Configure the following:</p><p><strong>Launch type</strong>: Fargate.</p><p><strong>Desired count</strong>: The number of tasks you want running. Start with two for high availability.</p><p><strong>Deployment configuration</strong>: Use rolling updates. Set minimum healthy percent to 100 and maximum percent to 200. This ensures zero downtime during deployments.</p><p><strong>Network configuration</strong>: Select your VPC, private subnets, and the security group for tasks. Enable auto-assign public IP only if tasks are in public subnets.</p><p><strong>Load balancing</strong>: Select the Application Load Balancer and target group you created.</p><p><strong>Service discovery</strong>: Optionally enable Cloud Map integration for internal service-to-service communication.</p><h3>Step 10: Configure Auto Scaling</h3><p>Your application should scale based on demand. ECS Service Auto Scaling adjusts the desired count automatically.</p><p>Configure target tracking scaling policies. Common metrics include:</p><p><strong>CPU utilization</strong>: Scale when average CPU across tasks exceeds a threshold.</p><p><strong>Memory utilization</strong>: Scale when memory usage grows.</p><p><strong>Request count per target</strong>: Scale based on how many requests each task handles.</p><p>Set minimum and maximum task counts. The minimum ensures you always have capacity. The maximum prevents runaway scaling and unexpected costs.</p><h3>Step 11: Set Up Logging and Monitoring</h3><p>Observability is critical for production workloads.</p><p>Your tasks should already send logs to CloudWatch Logs based on the task definition configuration. Create CloudWatch alarms for key metrics such as CPU utilization, memory utilization, and running task count.</p><p>Enable Container Insights on your cluster for enhanced metrics. This provides task-level and container-level visibility.</p><p>Set up CloudWatch dashboards to visualize your application health at a glance.</p><h3>Step 12: Verify the Deployment</h3><p>Once the service is running, verify everything works correctly.</p><p>Check the ECS console. Your service should show the desired number of running tasks. If tasks are failing, check the stopped tasks for error messages.</p><p>Check the target group health. All registered targets should be healthy. If not, verify security group rules and health check configuration.</p><p>Access your application through the load balancer DNS name. Verify it responds as expected.</p><h2>Deployment Flow for Updates</h2><p>Once your initial deployment is complete, updating your application follows this flow:</p><ol><li><p>Make changes to your application code</p></li><li><p>Build a new container image with a new tag</p></li><li><p>Push the image to ECR</p></li><li><p>Create a new task definition revision pointing to the new image</p></li><li><p>Update the ECS service to use the new task definition</p></li><li><p>ECS performs a rolling deployment automatically</p></li><li><p>Old tasks drain connections and stop</p></li><li><p>New tasks start and register with the load balancer</p></li></ol><p>This process can be automated with CI/CD pipelines using services like CodePipeline, GitHub Actions, or GitLab CI.</p><h2>Best Practices</h2><p><strong>Keep Images Lean</strong></p><p>Smaller images deploy faster. Use minimal base images. Remove unnecessary files and dependencies.</p><p><strong>Use Specific Image Tags</strong></p><p>Never use the latest tag in production. Always use specific versions so deployments are predictable and rollbacks are possible.</p><p><strong>Implement Health Checks</strong></p><p>Define health checks at both the container level and load balancer level. This ensures traffic only routes to healthy tasks.</p><p><strong>Store Secrets Securely</strong></p><p>Never bake secrets into container images. Use Secrets Manager or Parameter Store. Reference them in your task definition and inject them as environment variables at runtime.</p><p><strong>Plan for Failure</strong></p><p>Run tasks across multiple availability zones. Set appropriate desired counts. Configure auto scaling to handle load spikes.</p><p><strong>Monitor Costs</strong></p><p>Fargate pricing is based on CPU and memory. Right-size your task definitions. Use Spot capacity for fault-tolerant workloads to reduce costs significantly.</p><h2>Common Pitfalls to Avoid</h2><p><strong>Incorrect security group rules</strong>: Tasks cannot pull images or register with load balancers if security groups block traffic.</p><p><strong>Missing NAT Gateway</strong>: Tasks in private subnets cannot reach ECR or other AWS services without a NAT Gateway or VPC endpoints.</p><p><strong>Health check misconfiguration</strong>: If the health check path returns errors or times out, tasks will be marked unhealthy and constantly replaced.</p><p><strong>Insufficient task role permissions</strong>: Your application will fail to access AWS services if the task role lacks required permissions.</p><p><strong>Forgetting to update the service</strong>: Creating a new task definition revision does not automatically deploy it. You must update the service.</p><h2>Conclusion</h2><p>Deploying containers on ECS with Fargate removes the burden of managing servers while giving you full control over your application. The initial setup involves several components, but each serves a clear purpose.</p><p>Start with a simple deployment. Get it working end to end. Then iterate by adding auto scaling, improving monitoring, and building CI/CD pipelines.</p><p>The serverless nature of Fargate lets you focus on your application rather than infrastructure. You pay only for the resources your containers use, and scaling happens automatically based on your policies.</p><p>Now you have a clear roadmap. Pick your application and start building.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Serverless patterns for managing AWS Lambda at scale (from my re:Invent 2025 talk)]]></title><description><![CDATA[As serverless applications scale, managing hundreds of AWS Lambda functions becomes increasingly complex.]]></description><link>https://blog.thecloudengineers.com/p/serverless-patterns-for-managing</link><guid isPermaLink="false">https://blog.thecloudengineers.com/p/serverless-patterns-for-managing</guid><dc:creator><![CDATA[Lefteris Karageorgiou]]></dc:creator><pubDate>Wed, 28 Jan 2026 10:30:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/10a78211-c450-4cda-b62c-4e0b433e05f5_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As serverless applications scale, managing hundreds of AWS Lambda functions becomes increasingly complex. After presenting this topic twice at AWS re:Invent 2025 to excellent reception, I wanted to share these insights more broadly. This article presents three proven architectural patterns that help teams organize Lambda functions at scale while reducing operational overhead.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Introduction</h2><p>Building a serverless platform often starts with just a few AWS Lambda functions. Over time, however, as new features are added and the platform evolves, that number can quickly grow into the hundreds. At that scale, managing, deploying, and maintaining Lambda functions becomes increasingly complex.</p><p>This is a common challenge organizations face as they scale serverless architectures. As the number of Lambda functions increases, so does the operational overhead. One of the most frequent questions we hear from customers is:</p><p><strong>&#8220;How should we properly structure and organize a large number of Lambda functions?&#8221;</strong></p><p>In this article, we present three practical architectural patterns that help teams organize Lambda functions at scale, enabling them to grow serverless applications while keeping complexity under control.</p><h2>The Starting Point: Single-Purpose Lambda Functions</h2><p>When organizations first adopt AWS Lambda, they typically follow a straightforward approach: one function per responsibility. We refer to these as <em>single-purpose Lambda functions</em>.</p><p>Consider a typical e-commerce architecture:</p><ul><li><p><strong>GetProduct</strong> &#8211; Retrieve product details</p></li><li><p><strong>AddCart</strong> &#8211; Add items to a shopping cart</p></li><li><p><strong>RemoveCart</strong> &#8211; Remove items from the cart</p></li><li><p><strong>GetCart</strong> &#8211; View the cart</p></li><li><p><strong>SubmitOrder</strong> &#8211; Submit an order</p></li><li><p><strong>ReserveInventory</strong> &#8211; Reserve inventory</p></li><li><p><strong>ProcessPayment</strong> &#8211; Process payment</p></li><li><p><strong>FulfilOrder</strong> &#8211; Complete the order</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h8bJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h8bJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png" width="1456" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207412,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/186058233?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h8bJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!h8bJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7afc975-81da-4f17-98e2-20c21a20d3ad_2424x1150.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Amazon API Gateway acts as the front door, exposing HTTP endpoints and routing requests to the appropriate Lambda functions. Each function persists its data in a dedicated DynamoDB table.</p><p>This approach is perfectly valid and widely used. However, as organizations add features and scale their platforms, they often end up with hundreds or even thousands of Lambda functions in a single repository.</p><p>At scale, this typically results in:</p><ul><li><p>A large, monolithic repository</p></li><li><p>A single, complex CI/CD pipeline</p></li><li><p>High deployment risk, where a failure impacts the entire system</p></li></ul><p>To address these challenges, let&#8217;s explore three organizational patterns.</p><h2>Pattern 1: Microservices</h2><p>When all Lambda functions reside in a single repository backed by one AWS SAM template, complexity escalates quickly. Deployments become slow and risky, and ownership becomes unclear.</p><p>The solution is to apply <strong>microservices principles and bounded contexts</strong> to your serverless architecture.</p><p>Consider the cart-related functions: <code>AddCart</code>, <code>RemoveCart</code>, and <code>GetCart</code>. These functions:</p><ul><li><p>Operate on the same domain (shopping cart)</p></li><li><p>Share the same data model and DynamoDB table</p></li><li><p>Are typically owned by the same team</p></li></ul><p>These functions clearly belong together. Instead of managing them independently, group them into a dedicated service, such as a <code>cart-service</code>.</p><p>Repeat this process for other domains, including product, order, inventory, and payment. Each domain becomes its own service with:</p><ul><li><p>A dedicated repository</p></li><li><p>Its own AWS SAM template</p></li><li><p>An independent CI/CD pipeline</p></li></ul><p>In a true microservices architecture, services expose functionality through APIs rather than sharing infrastructure resources directly. As a result, SAM templates are typically self-contained and do not reference resources across services.</p><h3>Benefits</h3><p>This approach enables:</p><ul><li><p><strong>Independent development and deployment</strong> &#8211; Teams can ship changes without impacting unrelated services</p></li><li><p><strong>Clear ownership</strong> &#8211; Each service has a well-defined boundary and responsible team</p></li><li><p><strong>Reduced coupling</strong> &#8211; Services only include the dependencies they require</p></li></ul><p>This is the foundation for scaling serverless systems both technically and organizationally.</p><h2>Pattern 2: Lambda Web Adapter</h2><p>Even with improved organization, teams may still manage a large number of Lambda functions. In many cases, these functions can be consolidated.</p><p>This is where the <a href="https://github.com/awslabs/aws-lambda-web-adapter">Lambda Web Adapter pattern </a>becomes valuable. Instead of creating one Lambda function per HTTP endpoint, you run a traditional web framework inside a single Lambda function and handle routing internally.</p><p>For example, <code>AddCart</code> and <code>RemoveCart</code> both update the same DynamoDB table and operate within the same domain. These can be combined into a single <code>manage_cart</code> function.</p><p>Inside that function, you can use familiar frameworks such Express.js, Next.js, Flask, Django, Spring Boot, ASP.NET, Laravel, and so on. The Lambda Web Adapter makes this possible.</p><h3>Benefits</h3><p>This pattern offers several advantages:</p><ul><li><p><strong>Fewer cold starts</strong> &#8211; A single function replaces multiple endpoints</p></li><li><p><strong>Shared initialization</strong> &#8211; SDK clients and database connections are initialized once</p></li><li><p><strong>Familiar development model</strong> &#8211; Teams can use established web frameworks</p></li><li><p><strong>Simpler testing</strong> &#8211; Routing and logic can be tested locally without mocking Lambda events</p></li></ul><h3>A Word of Caution</h3><p>Consolidation should be applied carefully. Overuse can result in a monolithic Lambda function that negates the benefits of serverless design.</p><p>Keep functions aligned to a bounded context, and continuously monitor:</p><ul><li><p>Function size</p></li><li><p>Memory usage</p></li><li><p>Execution time</p></li></ul><p>Remember that a failure in a consolidated function can impact multiple operations.</p><h3>When to Consolidate Functions</h3><p>Consolidation is appropriate when functions:</p><ul><li><p>Share common dependencies or downstream services</p></li><li><p>Have similar memory and performance requirements</p></li><li><p>Share IAM permissions</p></li><li><p>Have comparable initialization costs</p></li></ul><p>If these factors align, consolidation is often beneficial.</p><h2>Pattern 3: API Gateway Direct Integration</h2><p>A common anti-pattern in serverless architectures is the use of Lambda functions that simply proxy requests to downstream services without applying any business logic.</p><p>For example, the <code>GetCart</code> function that retrieves an item from DynamoDB and returns it without transformation adds little value.</p><p>In such cases, the Lambda function can be removed entirely and replaced with an <strong>API Gateway direct integration</strong>.</p><p>With direct integrations, API Gateway communicates directly with AWS services such as DynamoDB or Step Functions, and others, eliminating the need for an intermediate Lambda function.</p><h3>Benefits</h3><p>This pattern provides:</p><ul><li><p><strong>Lower cost</strong> &#8211; No Lambda invocation charges</p></li><li><p><strong>Lower latency</strong> &#8211; One less hop in the request path</p></li><li><p><strong>No cold starts</strong> &#8211; API Gateway is always available</p></li><li><p><strong>Reduced operational overhead</strong> &#8211; Fewer functions to manage</p></li></ul><h3>When to Use This Pattern</h3><p>Use direct integrations when:</p><ul><li><p>Performing simple CRUD operations</p></li><li><p>No business logic or complex validation is required</p></li><li><p>The goal is to minimize Lambda usage</p></li></ul><p>Avoid this pattern when:</p><ul><li><p>Complex transformations are needed</p></li><li><p>Business logic or orchestration is involved</p></li><li><p>Advanced error handling is required</p></li></ul><h2>Recap</h2><p>To organize Lambda functions effectively at scale, consider the following patterns:</p><ol><li><p><strong>Microservices Organization</strong> &#8211; Group related functions by domain into independent services and repositories</p></li><li><p><strong>Lambda Web Adapter</strong> &#8211; Consolidate related functions inside a single Lambda function</p></li><li><p><strong>API Gateway Direct Integration</strong> &#8211; Eliminate unnecessary Lambda functions for simple operations</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5zpJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5zpJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png" width="1456" height="691" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:247577,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.thecloudengineers.com/i/186058233?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5zpJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 424w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 848w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1272w, https://substackcdn.com/image/fetch/$s_!5zpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c6de11-4fbd-4fe7-b54b-08d17292fa5f_2424x1150.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Applying these patterns allows teams to move from:</p><ul><li><p>A single repository with scattered Lambda functions</p></li><li><p>To a domain-driven, multi-repository architecture</p></li><li><p>To fewer, more meaningful Lambda functions</p></li><li><p>And, where appropriate, to no Lambda functions at all</p></li></ul><h2>Conclusion</h2><p>Organizing Lambda functions at scale does not need to be overwhelming. By grouping functions by domain, consolidating where appropriate, and leveraging API Gateway direct integrations, you can transform an unmanageable serverless environment into a clean, scalable, and maintainable architecture.</p><p>Start by reviewing your existing Lambda landscape. Identify shared domains, look for consolidation opportunities, and evaluate where direct integrations can reduce unnecessary compute. These incremental changes can significantly improve both developer productivity and operational efficiency.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.thecloudengineers.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading The Cloud Engineers! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item></channel></rss>