API Gateway Throttling: Protecting Your Backend Services

Aug 20, 2025

In your organisation, APIs serve as the front door to your business functionality. Without proper protections in place, a sudden surge in traffic—whether from legitimate users or malicious actors—can overwhelm your backend services, leading to poor performance, elevated costs, or even complete system failure.

Amazon API Gateway offers robust throttling capabilities that help you safeguard your backend services while maintaining a smooth experience for your users. Let's dive into how you can implement these protections effectively.

Why Throttling Matters

Before we explore the implementation details, understanding why API throttling is critical:

Backend Protection: Prevents your Lambda functions, EC2 instances, or other backend services from being overwhelmed
Cost Control: Limits excessive AWS resource consumption during traffic spikes
Abuse Prevention: Mitigates the impact of denial-of-service attacks
Fair Usage: Ensures resources are fairly distributed among all API consumers

API Gateway Throttling Layers

API Gateway implements throttling at two distinct levels:

1. Account-Level Throttling

By default, AWS applies throttling limits across your entire AWS account within each region:

Steady-state rate: 10,000 requests per second (RPS)
Burst capacity: 5,000 concurrent requests

These default limits can be increased by contacting AWS Support, but they serve as a baseline protection for your account.

2. Stage/Method-Level Throttling

This is where the real power lies—you can define custom throttling at various levels:

API stage (e.g., prod, dev)
Specific method (e.g., GET /products)
API key (for client-specific throttling)

Implementing Throttling in API Gateway

Setting Up Method-Level Throttling

When you need to limit requests to a specific API method:

Navigate to your API in the API Gateway console
Select the method you want to throttle
Under "Method Request", configure rate limits
Deploy your API for changes to take effect

Remember that method-level throttling takes precedence over stage-level settings.

Implementing Stage-Level Throttling

To apply consistent throttling across an entire API stage:

Go to the "Stages" section of your API
Select the target stage
Go to "Stage Settings"
Configure your throttling settings:
- Rate: requests per second
- Burst: maximum concurrent requests

Client-Specific Throttling with API Keys

For more granular control based on API consumers:

Create usage plans in API Gateway
Define throttling limits within each usage plan
Generate API keys for your clients
Associate API keys with usage plans
Require API key for your methods

This approach lets you offer different tiers of access to your API—premium clients might get higher limits than free-tier users.

What Are API Keys?

API keys are alphanumeric string values that you distribute to your API consumers. They serve multiple purposes:

Client identification: Recognize which client is making requests
Access control: Restrict API access to authenticated clients
Usage tracking: Monitor consumption patterns by client
Throttling enforcement: Apply client-specific rate limits

API keys are passed in the x-api-key header with each request. While not a replacement for proper authentication mechanisms like OAuth or IAM, they provide a simple way to identify API consumers.

What Are Usage Plans?

Usage plans in API Gateway are a powerful feature that enables you to:

Define throttling limits for groups of API consumers
Set quotas for the maximum number of requests during a specified time period
Associate API stages and methods with the plan
Attach API keys to the plan

Think of usage plans as service tiers for your API—you might have a "Basic" plan with lower limits and a "Premium" plan with higher thresholds.

Practical Use Case: Multi-Tiered API for a Weather Service

Let's explore a practical example of when and how to implement API Gateway throttling.

Scenario

You've built a weather forecasting API that provides:

Current weather conditions
7-day forecasts
Historical weather data
Severe weather alerts

Your backend is powered by Lambda functions that query weather databases and third-party weather services. Your API has growing popularity, and you need to implement tiered access while protecting your backend.

Implementation

1. Define Your Service Tiers

Create three usage plans:

Free Tier:

Rate: 5 requests per second
Burst: 10 requests
Quota: 1,000 requests per day
Access to basic endpoints only

Standard Tier:

Rate: 15 requests per second
Burst: 20 requests
Quota: 10,000 requests per day
Access to all endpoints except real-time alerts

Premium Tier:

Rate: 50 requests per second
Burst: 100 requests
Quota: Unlimited requests
Access to all endpoints with priority handling

2. Protect Critical Endpoints

Apply method-level throttling to your most resource-intensive endpoint, the historical weather data:

Limit to 10 RPS regardless of tier to protect your database
Set burst capacity to 15 requests

3. Safeguard Third-Party Dependencies

For endpoints that call third-party weather services:

Implement stage-level throttling that aligns with your third-party API limits
This prevents cascading failures if the third-party experiences issues

4. Monitor and Adjust

Set up CloudWatch dashboards to monitor:

Overall API usage trends
Throttling events by endpoint and by client
Error rates and latency

Use this data to periodically adjust your throttling settings to match actual usage patterns.

5. Special Handling for Weather Emergencies

Implement higher throttling limits for the severe weather alerts endpoint during emergencies:

Use API Gateway stage variables with Lambda to dynamically adjust throttling
This ensures critical information flows during high-demand periods

Benefits Realized

By implementing this tiered throttling strategy:

Backend Protection: During severe weather events when traffic spikes, your backend services remain responsive
Business Model Support: Throttling enforces your tiered pricing strategy
Cost Control: Prevents free-tier users from consuming excessive resources
Fair Usage: Ensures premium customers always get the throughput they've paid for

Best Practices for API Throttling

Start Conservative

Begin with stricter throttling limits and gradually loosen them based on real-world usage patterns. It's easier to increase limits than to recover from a service outage.

Monitor Throttling Metrics

API Gateway automatically publishes metrics to CloudWatch:

Count - Total number of requests
4XXError - Client errors (including 429 throttling responses)

Set up CloudWatch alarms on these metrics to be notified when your throttling is actively protecting your services.

Implement Client-Side Retry Logic

Advise your API consumers to implement exponential backoff retry strategies when they receive 429 "Too Many Requests" responses. This helps clients gracefully handle throttling events.

Consider Regional API Deployments

For high-traffic applications, deploying your API to multiple regions with Route 53 routing can distribute load and effectively increase your overall throughput.

When Throttling Occurs

When a request exceeds your defined throttling limits, API Gateway returns an HTTP 429 "Too Many Requests" error response. This happens without the request ever reaching your backend services, providing protection at the edge.

Conclusion

API Gateway's throttling capabilities offer an essential layer of protection for your backend services. By implementing thoughtful throttling strategies, you can ensure your applications remain responsive even during traffic spikes, protect yourself from malicious actors, and keep your AWS costs predictable.

Remember that throttling is just one aspect of a comprehensive API management strategy. Combine it with other API Gateway features like caching, request validation, and WAF integration for a robust API solution.

As your application evolves, regularly review your throttling configuration to ensure it aligns with your current business needs and usage patterns.

The Cloud Engineers

Discussion about this post

Ready for more?