Key Takeaway
Exactly-once delivery is physically impossible because networks are unreliable. In distributed systems, you either get at-least-once (duplicates) or at-most-once (data loss). To simulate exactly-once results, you must accept retries and use idempotency keys to ensure the side effects only happen once.
Every system diagram says it: "Exactly-once processing." Sounds perfect. Sounds safe. Sounds like what you want.
But in distributed systems, exactly-once is not a guarantee — it's an illusion.
Whether you're dealing with Stripe webhooks, Zapier automations, payment buttons, or AI agents — the fundamental challenge is the same: networks are unreliable, and retries are inevitable.
🚨 The Hard Truth
Networks fail. Packets drop. Connections reset. Responses get lost.
When two systems talk, there are only two things you can know:
- You sent the request.
- You didn't receive confirmation.
You can never know for sure whether the other side processed it. So systems retry. And retries mean duplicates are inevitable.
The Two Generals Problem:
This is a classic thought experiment in distributed systems. Two armies need to coordinate an attack, but their only communication is through messengers who might be captured. Even if General A receives confirmation from General B, General B doesn't know if General A received that confirmation. This requires another confirmation, which itself needs confirmation—an infinite loop. There is no algorithm that guarantees both parties know the message was received in an unreliable network.
This fundamental limitation of distributed systems is why exactly-once delivery is impossible at the network level.
🔁 What Systems Actually Guarantee
Most real-world systems choose at-least-once, because losing events is far worse than duplicating them.
| Guarantee | Risk | Reality |
|---|---|---|
| At-most-once | Data Loss | Messages may never arrive. |
| At-least-once | Duplicates | Messages arrive, possibly multiple times. |
| Exactly-once | Complexity | A combination of at-least-once + idempotency. |
🧠 Why "Exactly-Once" Marketing Misleads
What providers really mean when they claim "exactly-once" is: "We deliver at-least-once, and you must handle duplicates."
Even systems like Kafka only achieve exactly-once within strict internal boundaries. Once your system touches the outside world — external APIs, webhooks, or third-party services — you're back in retry-land.
Kafka's "Exactly-Once" Explained
Kafka's exactly-once semantics (EOS) work within a controlled environment:
What Kafka Actually Guarantees:
- Idempotent Producers: Kafka assigns each message a sequence number. If the producer retries, Kafka deduplicates based on this number.
- Transactional Reads/Writes: Kafka can write multiple messages atomically across partitions and read them transactionally.
- Consumer Group Offset Management: Offsets are committed transactionally with processing, preventing double-reads.
The boundary: This only works inside Kafka. Once your consumer calls an external API, sends an email, or updates a database, you need application-level idempotency.
How Other Message Queues Handle This
| System | Default Guarantee | How It Works |
|---|---|---|
| RabbitMQ | At-least-once | Messages redelivered if not acked; you must dedupe |
| AWS SQS | At-least-once | Retries on visibility timeout; duplicates possible |
| Google Pub/Sub | At-least-once | Redelivery on nack or timeout; idempotency required |
| Kafka | At-least-once* | EOS within Kafka; external calls need idempotency |
| Azure Service Bus | At-least-once | Peek-lock pattern; duplicates on timeout |
💥 Real-World Examples
This isn't theoretical. Every production system faces this challenge:
- Payment processing: A double-click on a checkout button causes two charges because the network was slow confirming the first request.
- Webhook handlers: Stripe sends the same webhook multiple times when your server responds slowly, creating duplicate subscription activations.
- Automation platforms: Zapier retries or Make.com scenarios run twice when API calls timeout, sending duplicate notifications.
- Email systems: Network timeouts cause duplicate emails when the sending API doesn't confirm receipt fast enough.
- Database operations: Race conditions create duplicate records when two identical requests arrive milliseconds apart.
- AI agents: AI systems retry tool calls when responses are unclear or delayed, multiplying side effects.
✅ The Approach That Actually Works
You don't stop duplicates. You make duplicates harmless. That's idempotency.
Instead of System A → System B, you do:
The layer accepts an idempotency key, stores the result of the first execution, and blocks duplicate side effects. Retries stop being dangerous.
Implementation Patterns
Here's how to implement idempotency in different scenarios:
Pattern 1: Redis-Based Idempotency (Node.js)
const redis = require('redis');
const client = redis.createClient();
async function executeIdempotent(key, action) {
// Check if we've seen this key before
const cached = await client.get(key);
if (cached) {
return JSON.parse(cached); // Return cached result
}
// Execute the action
const result = await action();
// Store result with 24h TTL
await client.setex(key, 86400, JSON.stringify(result));
return result;
}
// Usage in payment handler
app.post('/checkout', async (req, res) => {
const idempotencyKey = req.headers['idempotency-key'];
const result = await executeIdempotent(idempotencyKey, async () => {
// This only runs once, even if retried
const charge = await stripe.charges.create({
amount: req.body.amount,
currency: 'usd',
source: req.body.token
});
await db.orders.create({
chargeId: charge.id,
userId: req.user.id
});
return { orderId: charge.id, status: 'success' };
});
res.json(result);
});
Pattern 2: Database-Based Idempotency (Python)
from datetime import datetime, timedelta
import json
def execute_idempotent(db, key, action):
# Try to fetch existing result
result = db.query(
"SELECT result FROM idempotency_keys WHERE key = %s",
(key,)
).first()
if result:
return json.loads(result['result'])
# Execute action
action_result = action()
# Store with timestamp
db.execute(
"""INSERT INTO idempotency_keys (key, result, created_at)
VALUES (%s, %s, %s)
ON CONFLICT (key) DO NOTHING""",
(key, json.dumps(action_result), datetime.now())
)
# Clean old keys (optional background job)
db.execute(
"DELETE FROM idempotency_keys WHERE created_at < %s",
(datetime.now() - timedelta(days=1),)
)
return action_result
# Usage in webhook handler
@app.route('/webhooks/stripe', methods=['POST'])
def stripe_webhook():
event = request.json
event_id = event['id'] # Stripe's unique event ID
result = execute_idempotent(db, event_id, lambda: {
'user_id': activate_subscription(event['data']['object']),
'status': 'activated'
})
return jsonify(result)
Pattern 3: Using OnceOnly API
// Delegate duplicate detection to OnceOnly (check-lock)
const key = `order-${userId}-${sessionId}`;
const lockRes = await fetch("https://api.onceonly.tech/v1/check-lock", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.ONCEONLY_API_KEY}`, // once_live_***
"Content-Type": "application/json",
},
body: JSON.stringify({
key,
ttl: 3600,
metadata: { userId, sessionId }
}),
});
const lock = await lockRes.json();
if (lock.status === "duplicate") {
// Don't re-run side effects. Return your cached result (DB/Redis) if you store it.
return await getCachedResult(key);
}
// New action: execute once, then store result under the same key.
const result = await processOrder({ userId, items: cartItems, total: cartTotal });
await saveCachedResult(key, result);
return result;
🧩 The Mental Model Shift
Stop thinking: "How do we prevent retries?"
Start thinking: "How do we make retries safe?"
Retries are built into the internet, cloud infrastructure, APIs, and AI systems. They're not bugs — they're reality. The physics of distributed systems require retries for reliability.
⚡ Engineering Principle
In distributed systems, the question is never "Will this request be retried?" The question is "When this request is retried, will my system handle it safely?" Build for retries from day one.
📊 Delivery Guarantees Compared
| Aspect | At-Most-Once | At-Least-Once | "Exactly-Once" |
|---|---|---|---|
| Delivery | 0 or 1 times | 1+ times | 1 time (illusion) |
| Data Loss Risk | High | None | None |
| Duplicate Risk | None | High | Handled |
| Complexity | Low | Medium | High |
| Real Implementation | Fire-and-forget | Retry without dedup | At-least-once + idempotency |
| Use Cases | Metrics, logs | Most systems | Payments, critical ops |
Stop fighting the network. Start using idempotency.
❓ Frequently Asked Questions
Why can't we just make networks reliable?
Networks are physical infrastructure spanning continents. Packets can be lost due to: hardware failures, routing issues, congestion, packet corruption, timeout windows being too short, or even cosmic rays flipping bits in memory. No amount of engineering can eliminate these physical realities—we can only build systems that handle them gracefully.
What about Kafka's "exactly-once" semantics?
Kafka achieves exactly-once within its internal boundaries: from producer to topic to consumer group. But once you call an external API, send a webhook, or trigger an email from your Kafka consumer, you're back to at-least-once delivery. Kafka's exactly-once is really "at-least-once with idempotent producers and transactional reads."
Is at-most-once ever the right choice?
Rarely. At-most-once (fire-and-forget) is acceptable only when losing data is preferable to processing it twice. Examples: real-time metrics where missing a few data points is acceptable, or logging systems where occasional log loss is tolerable. For anything involving money, user data, or state changes, at-most-once is dangerous.
How do distributed transactions (2PC) fit into this?
Two-phase commit (2PC) attempts to provide atomic operations across multiple systems but has serious downsides: it's slow, blocks resources during coordinator failures, and still doesn't solve the duplicate problem if a participant crashes after committing but before acknowledging. Modern systems prefer eventual consistency with idempotency over distributed transactions.
Can I use database transactions instead of idempotency?
Database transactions only ensure atomicity within the database. If your operation involves external services (payment gateways, email APIs, webhooks), transactions can't help. You might successfully roll back a database insert, but you can't roll back an email that's already been sent or a payment that's already been charged.
What systems commonly suffer from duplicate processing?
Nearly all production systems: payment webhooks, automation platforms like Zapier, AI agent tool calls, message queues, microservices architectures, and any API that experiences network failures. If your system doesn't handle duplicates gracefully, it will create them in production.
How long should I cache idempotency results?
It depends on your retry window. For HTTP APIs, 24 hours is typical. For webhooks, match the sending provider's retry period (Stripe retries for up to 3 days). For user-facing actions like checkout, 1-24 hours is usually sufficient. Balance between preventing duplicates and storage costs.
What's the difference between idempotency and deduplication?
Deduplication detects and discards duplicate messages. Idempotency ensures that processing the same message multiple times has the same effect as processing it once. Idempotency is stronger: even if duplicates reach your system, they don't cause problems. Deduplication alone can fail if the dedup check and the processing aren't atomic.
Do I need idempotency for read-only operations?
No. Idempotency matters for operations with side effects: writes, payments, emails, notifications, state changes. Read operations are naturally idempotent—fetching data multiple times doesn't change anything. However, you may still want caching for performance, which is a different concern.
Deep Dives by Use Case
Explore how exactly-once myths manifest in specific systems:
- → Stripe Webhooks Are NOT Exactly-Once
- → Why Zapier Zaps Run Twice (And How to Stop It)
- → Make.com Scenarios Running Twice? Here's Why
- → How to Stop AI Agents from Repeating Actions
- → Double-Click, Double-Charge: Why "One Click" Becomes Two
- → Webhook Retries Are Silent Killers
- → Why Your System Sends Duplicate Emails
- → Why Your API Creates Duplicate Records