AWS Networking 101: Everything You Need to Know About VPCs
If you’re starting your AWS journey, you’ve probably heard about VPC (Virtual Private Cloud) and wondered what all the fuss is about. Maybe you’ve seen network diagrams that look like spaghetti, or you’ve tried to follow documentation that assumes you already know what a subnet is. Perhaps you’ve even accidentally launched resources into the default VPC without fully understanding what was happening behind the scenes, and that’s completely normal.
Here’s the truth: VPCs are simpler than they seem. Once you understand the core concepts and how they fit together, everything else clicks into place. The terminology can feel overwhelming at first, CIDR blocks, route tables, gateways, but each of these is just a small, logical piece of a bigger puzzle. This guide will walk you through VPCs step-by-step, with visual explanations that make sense. By the end, you won’t just understand VPCs, you’ll understand why they’re designed the way they are.
What Is a VPC, Really?
Think of a VPC as your own private section of the AWS cloud. It’s like having your own data center, but virtual. Inside this space, you can launch AWS resources (like EC2 instances, databases, and Lambda functions) and control exactly how they communicate with each other and the outside world.
To make this even more concrete, imagine you’re renting an entire floor in a massive office building. The building is AWS’s cloud infrastructure, shared among millions of tenants. But your floor? That’s entirely yours. You decide how to partition the rooms, who gets access to what, which doors lead outside, and which rooms are locked away from public view. Other tenants can’t wander into your space, and you can’t wander into theirs, unless you both explicitly agree to connect your floors together.
Every AWS account comes with a default VPC in each region, pre-configured so you can start launching resources immediately. However, for production workloads, you’ll almost always want to create a custom VPC where you have full control over the network design. The default VPC is convenient for experimentation, but it makes assumptions about your architecture that may not align with your security or organizational needs.
The key benefit: Complete control over your network environment, IP addresses, subnets, route tables, and network gateways. You decide what’s exposed, what’s hidden, and how every piece communicates.
The Building Blocks
Let’s break down the essential components you need to understand:
1. VPC (Virtual Private Cloud)
What it is: The container for your entire network. It’s the foundational layer upon which everything else is built.
Key details:
You define an IP address range using CIDR notation (e.g.,
10.0.0.0/16). CIDR stands for Classless Inter-Domain Routing, and it’s simply a compact way of expressing a range of IP addresses. The/16part is called the prefix length and determines how many IP addresses are available within the range. A smaller number after the slash means a larger network,/16gives you 65,536 addresses, while/24gives you only 256.Choosing your CIDR block is one of the most important early decisions because you cannot change it after creation (though you can add secondary CIDR blocks later). If you choose a range that’s too small, you’ll run out of IP addresses as your infrastructure grows. If you choose one that overlaps with another VPC or your on-premises network, you’ll run into conflicts when you try to connect them later.
Each VPC is isolated from other VPCs by default. This is a fundamental security property. Even if two VPCs use the exact same CIDR block, they cannot communicate unless you explicitly set up a connection between them (using VPC Peering or a Transit Gateway). This isolation means a misconfiguration in one VPC won’t accidentally leak traffic into another.
A VPC exists within a single AWS Region but spans all of that region’s Availability Zones. This is important: your VPC isn’t confined to a single data center. It’s a regional construct that gives you the ability to distribute resources across multiple physically separated facilities.
Visual concept:
┌───────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ [Your AWS resources live here] │
│ │
└───────────────────────────────────┘2. Subnets
What they are: Subdivisions of your VPC’s IP address range. If the VPC is your floor in the office building, subnets are the individual rooms you create within it.
Why you need them: To organize resources and control access to the internet. Subnets give you the ability to apply different network rules and routing behaviors to different groups of resources. Without subnets, every resource in your VPC would have to follow the same networking rules, and that’s not practical when some resources need public exposure and others need to stay hidden.
Two types:
Public subnets: Resources here can communicate directly with the internet. These are for things that need to be reachable by users or external systems, web servers, load balancers, bastion hosts, and similar front-facing infrastructure.
Private subnets: Resources here cannot directly access the internet and, crucially, cannot be directly reached from the internet. This makes them far more secure. These are ideal for databases, application servers, internal microservices, caches, and anything that doesn’t need a public-facing presence.
It’s important to understand that there’s nothing inherently “public” or “private” about a subnet itself. What makes a subnet public or private is its route table configuration, specifically, whether its route table includes a route to an Internet Gateway. This is a common source of confusion for beginners who think the distinction is set at creation time.
Each subnet lives within a single Availability Zone and cannot span multiple AZs. This is by design, it allows AWS to provide strong isolation guarantees. If one AZ experiences issues, subnets in other AZs remain unaffected. This is why best practice dictates creating matching subnets across multiple Availability Zones, so your application can survive the loss of an entire data center.
Each subnet also has a small number of reserved IP addresses that AWS uses for internal purposes. In every subnet, the first four IP addresses and the last IP address are reserved. For a /24 subnet (256 addresses), that means 251 are actually usable. This is a small detail, but it matters when you’re planning tightly-sized subnets.
Visual concept:
┌──────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Public Subnet │ │ Private Subnet │ │
│ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │
│ │ │ │ │ │
│ │ [Web Servers] │ │ [Databases] │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
└──────────────────────────────────────────────┘Best practice: Create subnets in multiple Availability Zones for high availability. A common pattern is to have at least two public subnets and two private subnets, each pair spread across two different AZs. Some organizations go further and use three AZs for even greater resilience.
3. Internet Gateway (IGW)
What it is: The door between your VPC and the public internet. It’s a horizontally scaled, redundant, and highly available component managed entirely by AWS.
What it does: Allows resources in public subnets to communicate with the internet, both sending requests out and receiving requests in. Without an Internet Gateway, your VPC is a completely sealed-off network with no path to the outside world.
The Internet Gateway serves two critical functions. First, it acts as a target in your route table for internet-bound traffic, giving your resources a path to follow when they need to reach an external address. Second, it performs Network Address Translation (NAT) for instances that have been assigned a public IPv4 address. When traffic leaves your instance, the IGW replaces the private IP with the public IP; when responses come back, it reverses the translation. This is why simply attaching an IGW isn’t enough, your instances also need public IP addresses (or Elastic IPs) and the correct route table entries to actually use it.
Visual concept:
Internet
↕
┌────────────────┐
│ Internet │
│ Gateway (IGW) │
└────────────────┘
↕
┌────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌────────────────┐ │
│ │ Public Subnet │ │
│ │ 10.0.1.0/24 │ │
│ │ │ │
│ │ [Web Server] │ │
│ └────────────────┘ │
└────────────────────────────────────┘Key point: You attach one IGW per VPC. You cannot attach multiple Internet Gateways to the same VPC, and you cannot share an IGW across VPCs. It’s a one-to-one relationship. Without it, nothing in your VPC can reach the internet, and nothing on the internet can reach your VPC. It’s also worth noting that the IGW itself doesn’t impose bandwidth limits or become a bottleneck, AWS manages its scaling transparently.
4. Route Tables
What they are: Sets of rules (called routes) that determine where network traffic is directed. Every single packet of data that moves through your VPC is guided by a route table.
How they work: Each subnet is associated with exactly one route table, and that route table tells traffic where to go based on its destination. When a resource in your subnet sends a packet, the VPC examines the destination IP address, looks it up in the route table, and forwards the packet to the matching target. If there are multiple matching routes, the most specific route (the one with the longest prefix) wins.
Every VPC comes with a main route table automatically. Any subnet that isn’t explicitly associated with a custom route table will use the main route table by default. This is a subtle but important point, it means newly created subnets silently inherit the main route table’s behavior. For this reason, many practitioners recommend keeping the main route table restrictive (private) and explicitly assigning custom route tables with internet access only to the subnets that need it. This way, if someone creates a new subnet and forgets to assign a route table, it defaults to private rather than accidentally becoming public.
Example routes:
10.0.0.0/16 → local: Traffic destined for any IP within the VPC stays inside the VPC. This route is automatically present in every route table and cannot be removed. It’s what allows your resources to communicate with each other across subnets.0.0.0.0/0 → igw-xxxxx: All other traffic (anything not matching a more specific route) goes to the Internet Gateway. This is the route that makes a subnet “public.” The0.0.0.0/0destination is a catch-all, often called the default route.
Visual concept:
Public Subnet Route Table:
┌─────────────────┬──────────────┐
│ Destination │ Target │
├─────────────────┼──────────────┤
│ 10.0.0.0/16 │ local │
│ 0.0.0.0/0 │ igw-xxxxx │
└─────────────────┴──────────────┘
Private Subnet Route Table:
┌─────────────────┬──────────────┐
│ Destination │ Target │
├─────────────────┼──────────────┤
│ 10.0.0.0/16 │ local │
└─────────────────┴──────────────┘The difference: Public subnets have a route to the IGW; private subnets don’t. That single route entry is literally the only thing that separates a “public” subnet from a “private” one. Route tables can also contain routes pointing to other targets like NAT Gateways, VPC Peering connections, Transit Gateways, VPN connections, and VPC endpoints, which is how more complex architectures are built.
5. NAT Gateway
What it is: A managed service that allows resources in private subnets to initiate outbound connections to the internet (for software updates, API calls, downloading patches, etc.) without exposing them to inbound internet traffic. It performs Network Address Translation, replacing the private IP address of the originating resource with its own public IP address before forwarding the request to the internet.
Why you need it: Your database in a private subnet needs to download security patches, your application server needs to call a third-party API, or your Lambda function in a private subnet needs to reach an external webhook. But you don’t want any of these resources to be directly reachable from the internet. The NAT Gateway solves this by acting as a one-way intermediary, it allows traffic out but blocks unsolicited traffic in.
Visual concept:
Internet
↕
┌────────────────┐
│ Internet │
│ Gateway (IGW) │
└────────────────┘
↕
┌──────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌────────────────┐ ┌─────────────────┐ │
│ │ Public Subnet │ │ Private Subnet │ │
│ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │
│ │ │ │ │ │
│ │ [NAT Gateway] │←───│ [Database] │ │
│ └────────────────┘ └─────────────────┘ │
│ │
└──────────────────────────────────────────────┘How it works:
The database initiates an outbound connection (e.g., to download a security patch).
The private subnet’s route table sends internet-bound traffic (
0.0.0.0/0) to the NAT Gateway in the public subnet.The NAT Gateway replaces the database’s private IP address with its own public Elastic IP address and forwards the traffic to the Internet Gateway.
The IGW sends the request out to the internet.
The response comes back through the same path in reverse, IGW to NAT Gateway, which translates the address back and forwards the response to the database.
The key security property here is that while the database can reach out to the internet, no one on the internet can initiate a connection to the database. The NAT Gateway will simply drop any unsolicited inbound traffic.
Important architectural consideration: A NAT Gateway is deployed into a specific subnet within a specific Availability Zone. If that AZ goes down, resources in private subnets in other AZs that depend on that NAT Gateway will lose their internet access. This is why production architectures deploy a NAT Gateway in each AZ and configure each private subnet to route through the NAT Gateway in its own AZ. This adds cost but ensures that an AZ failure doesn’t cascade into a networking failure for your private resources.
Cost note: NAT Gateways cost approximately $0.045/hour (about $32/month) plus data processing charges of $0.045 per GB. In a multi-AZ setup with two NAT Gateways, that’s roughly $64/month before any data transfer. For learning environments or development workloads, you can use a NAT Instance (a regular EC2 instance configured to perform NAT) to save money, though it won’t offer the same availability, bandwidth, or managed maintenance that a NAT Gateway provides. Some teams also explore VPC endpoints as a way to avoid NAT Gateway costs entirely for traffic destined to supported AWS services like S3 or DynamoDB.
6. Security Groups
What they are: Virtual firewalls that control inbound and outbound traffic at the instance level (more precisely, at the network interface level). Every EC2 instance, RDS database, Lambda function in a VPC, and many other AWS resources are associated with one or more Security Groups.
How they work: You define rules that allow specific traffic. This is an important distinction: Security Groups are allow-only. You cannot write a rule that explicitly denies traffic. If traffic doesn’t match any allow rule, it is implicitly denied. This makes Security Groups relatively straightforward to reason about, if it’s not explicitly permitted, it’s blocked.
Example rules:
Web Server Security Group:
Inbound:
- Allow HTTP (port 80) from 0.0.0.0/0 (anywhere)
- Allow HTTPS (port 443) from 0.0.0.0/0
- Allow SSH (port 22) from your IP only
Outbound:
- Allow all traffic (default)Database Security Group:
Inbound:
- Allow MySQL (port 3306) from Web Server Security Group only
Outbound:
- Allow all trafficOne of the most powerful features of Security Groups is the ability to reference other Security Groups as sources or destinations in your rules. In the example above, the database Security Group doesn’t allow MySQL traffic from a specific IP address, it allows it from any resource associated with the Web Server Security Group. This means if you add a new web server and attach the same Security Group, it automatically gains database access without any rule changes. This creates a dynamic, self-maintaining security model that scales with your infrastructure.
Key concept: Security Groups are stateful, if you allow inbound traffic on a port, the corresponding response traffic is automatically allowed outbound, regardless of the outbound rules. Likewise, if an instance initiates an outbound connection, the response is automatically allowed inbound. This means you don’t need to worry about creating matching rules for both directions of a conversation, which significantly reduces the complexity of your firewall configuration.
Another important behavior: Security Groups evaluate all rules before making a decision. Unlike NACLs (which process rules in order and stop at the first match), Security Groups aggregate all rules and allow traffic if any rule permits it. This means you never have to worry about rule ordering within a Security Group.
7. Network ACLs (NACLs)
What they are: An additional layer of security that operates at the subnet level. While Security Groups wrap around individual resources, NACLs wrap around entire subnets, filtering traffic as it enters and exits the subnet boundary.
Difference from Security Groups:
NACLs are stateless, you must explicitly write rules for both inbound and outbound traffic. If you allow inbound HTTP traffic on port 80, you must also create an outbound rule allowing the response traffic (which typically uses an ephemeral port in the range 1024–65535). Forgetting the outbound rule is one of the most common NACL mistakes, and it results in traffic appearing to be silently dropped.
NACLs support both allow and deny rules, unlike Security Groups which only support allow rules. This makes NACLs useful for explicitly blocking specific IP addresses or ranges, for example, blocking a known malicious IP that’s attempting to probe your infrastructure.
NACLs process rules in order, starting from the lowest numbered rule and stopping at the first match. This means rule ordering matters significantly. A common pattern is to use low-numbered rules (like 100, 200, 300) for your specific allow/deny rules, leaving gaps so you can insert new rules later without renumbering everything.
NACLs apply to all resources within a subnet automatically. You don’t attach a NACL to an individual instance, every resource in the subnet is subject to the NACL’s rules, with no exceptions.
Each subnet must be associated with exactly one NACL. If you don’t explicitly assign one, the subnet uses the VPC’s default NACL, which allows all inbound and outbound traffic. Custom NACLs, by contrast, deny all traffic by default until you add rules, another common gotcha for beginners.
Think of NACLs and Security Groups as two concentric walls of defense. Traffic entering a subnet first passes through the NACL. If the NACL allows it, the traffic then reaches the Security Group attached to the target resource. Both must permit the traffic for it to get through. This is an example of defense in depth, even if one layer is misconfigured, the other layer can still provide protection.
When to use them: Most beginners can stick with Security Groups for day-to-day access control. NACLs are most useful when you need subnet-wide rules, when you need to explicitly deny traffic from specific sources, or when your organization’s compliance requirements mandate multiple layers of network filtering. In many production environments, NACLs are kept relatively permissive while Security Groups do the heavy lifting of fine-grained access control.
Putting It All Together: A Complete VPC Architecture
Here’s a production-ready VPC setup:
Internet
↕
┌────────────────┐
│ Internet │
│ Gateway (IGW) │
└────────────────┘
↕
┌─────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ Availability Zone A Availability Zone B │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Public Subnet │ │ Public Subnet │ │
│ │ 10.0.1.0/24 │ │ 10.0.3.0/24 │ │
│ │ │ │ │ │
│ │ [Web Server A] │ │ [Web Server B] │ │
│ │ [NAT Gateway A] │ │ [NAT Gateway B] │ │
│ └──────────────────┘ └──────────────────┘ │
│ ↕ ↕ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Private Subnet │ │ Private Subnet │ │
│ │ 10.0.2.0/24 │ │ 10.0.4.0/24 │ │
│ │ │ │ │ │
│ │ [Database A] │ │ [Database B] │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────┘Why this architecture works:
High availability: Resources are distributed across multiple Availability Zones. If AZ-A experiences a hardware failure, a power outage, or a network issue, AZ-B continues to serve traffic. Your application stays online because it doesn’t depend on any single physical location.
Security: Databases sit in private subnets with no direct internet access. The only way to reach them is through the web servers (controlled by Security Groups) or via internal networking. This dramatically reduces the attack surface for your most sensitive data.
Internet access: NAT Gateways in each AZ ensure that private resources can still reach the internet for necessary outbound tasks, operating system patches, third-party API calls, package downloads, without compromising their isolation from inbound internet traffic. Each private subnet routes through the NAT Gateway in its own AZ, so an AZ failure doesn’t cut off internet access for the other AZ’s private resources.
Scalability: An Application Load Balancer (which would sit in the public subnets) distributes incoming user traffic across web servers in both AZs. As demand grows, you can add more instances behind the load balancer, or use Auto Scaling Groups to add and remove them automatically based on metrics like CPU utilization or request count.
This architecture is often extended with additional subnet tiers. Many organizations add a third tier, sometimes called an “isolated” or “data” tier, for resources that should have no internet access at all, not even outbound. You might also see dedicated subnets for things like Elasticache clusters, internal load balancers, or container orchestration infrastructure. The pattern stays the same; you’re just adding more rooms to the floor plan.
Conclusion
VPCs don’t have to be intimidating. At their core, they’re built from a handful of straightforward concepts: a VPC gives you an isolated network, subnets help you organize and secure your resources, an Internet Gateway connects you to the outside world, route tables direct traffic, NAT Gateways let private resources reach the internet safely, and Security Groups and NACLs act as your gatekeepers.
Once you understand how these pieces snap together, you’re not just following tutorials, you’re making informed decisions about how your infrastructure should work. You’ll understand why a database should live in a private subnet, why a route table needs a specific entry, and why Security Groups reference each other instead of hard-coded IP addresses.
Here’s what I’d recommend as your next steps:
Get hands-on. Create a VPC from scratch in the AWS Console (not the default one). Set up a public subnet, a private subnet, an Internet Gateway, and a route table. Launch an EC2 instance in the public subnet and verify you can SSH into it. Then launch one in the private subnet and observe that you can’t. Break things on purpose, remove the route to the IGW and see what happens. Remove the Security Group rule for SSH and watch the connection fail. These small experiments will teach you more than reading ever could.
Start simple. You don’t need multi-AZ NAT Gateways on day one. Build the basic architecture first, one public subnet, one private subnet, one AZ. Get comfortable with how traffic flows. Then layer in high availability by duplicating the setup across a second AZ. Add a load balancer. Add a NAT Gateway. Each addition will make more sense because you understand the foundation it’s building on.
Explore Infrastructure as Code. Once you’re comfortable building VPCs manually, try recreating the same setup with CloudFormation or Terraform. Defining your VPC in code forces you to understand every dependency and every relationship between components, you can’t click “next” and let the console fill in defaults. It’ll solidify your understanding and make your infrastructure repeatable, version-controlled, and shareable with your team.
Learn about VPC connectivity options. Once your single-VPC architecture is solid, start exploring how VPCs connect to each other and to the outside world. VPC Peering lets two VPCs communicate directly. Transit Gateway simplifies networking when you have many VPCs. VPN connections and AWS Direct Connect link your VPC to on-premises data centers. VPC Endpoints let your private resources access AWS services like S3 and DynamoDB without traversing the internet at all. Each of these builds directly on the concepts you’ve learned here.
The production-ready architecture we walked through above isn’t just a diagram, it’s the foundation that most real-world AWS applications are built on. Every load balancer, container cluster, and serverless application you’ll work with in the future sits inside a VPC. By understanding these fundamentals now, you’re setting yourself up to build cloud infrastructure that’s secure, scalable, and resilient from the start.


