Encryption Decisions in Modern Distributed Systems

Encryption Is a System, Not a Feature

The most common mistake teams make with encryption is treating it as a checkbox. "Do we encrypt data at rest? Yes. Do we encrypt data in transit? Yes." That's a compliance answer, not an engineering one.

Real encryption strategy requires thinking about the entire lifecycle. Where are keys generated? Who has access to them? How are they rotated? What happens during an incident? How do you decrypt data when you need to investigate a breach? These questions rarely have simple answers, and the wrong answer to any of them can make your encryption worthless.

Encryption is fundamentally about trust management. You're deciding who can access what information, and you're using mathematical guarantees to enforce those decisions. But the math only works if the surrounding system is designed correctly. The strongest encryption algorithm in the world doesn't help if the key is stored in plaintext in an environment variable that every developer can read.

Key Lifecycle Realities

Key management is where encryption strategies succeed or fail. The algorithm matters far less than how you handle the keys.

Key generation should happen in hardware security modules or managed key services—never in application code. Application-generated keys are almost always weaker than they should be because of poor random number generation, predictable seeding, or inadequate entropy.

Key rotation is the operational reality that most teams underestimate. In theory, you rotate keys on a schedule. In practice, rotation means re-encrypting data with new keys, which has performance implications, requires careful coordination, and needs to handle failures gracefully. If your rotation process can leave data encrypted with a key that no longer exists, you have a data loss bug masquerading as a security feature.

Access control for keys is as important as the encryption itself. If your application servers have direct access to decryption keys, then compromising an application server compromises the data. Envelope encryption patterns, where data is encrypted with a data key that is itself encrypted with a master key, provide better isolation because the master key never needs to be on the application server.

Incident response with encrypted data is an area where poor planning causes real pain. When you need to investigate a security incident, you may need to decrypt data for forensics. If your key management process doesn't account for this, you'll either be unable to investigate or you'll have to make dangerous key management decisions under time pressure.

What to Encrypt (and When Not To)

Not all data needs the same level of encryption, and over-encrypting creates operational overhead that can actually reduce security by making key management more complex and error-prone.

Classify data by sensitivity and regulatory requirement. Personally identifiable information, financial data, and health records typically require encryption at rest by regulation. But encrypting every log entry, every cached value, and every temporary file creates complexity without proportional security benefit.

Consider the threat model. Encryption at rest protects against physical theft of storage media and unauthorized access to database files. It does not protect against application-level vulnerabilities that have legitimate access to decryption. If your threat model is primarily about application compromise, encryption at rest is necessary but not sufficient.

Field-level encryption provides granular control but at significant operational cost. Encrypting individual database columns means you can give some services access to some fields without exposing others. This is powerful for multi-tenant systems or systems with strict data isolation requirements. But it also means you can't index encrypted fields, can't search them efficiently, and have to manage keys per field or per tenant.

At-Rest vs In-Transit Thinking

In-transit encryption (TLS) is table stakes. There's no legitimate reason not to encrypt data in transit in a modern system. The performance overhead is negligible, the tooling is mature, and mutual TLS provides authentication in addition to encryption.

At-rest encryption comes in layers. Full-disk encryption protects against physical theft but provides no protection against anyone with system access. Database-level encryption (transparent data encryption) protects against unauthorized database file access. Application-level encryption protects data even from database administrators. Each layer addresses different threat vectors.

The gap between at-rest and in-transit is where data is most vulnerable. Data is decrypted for processing, held in memory, potentially logged, cached, or passed through queues. Understanding where data exists in cleartext is essential for a complete encryption strategy.

Common Failure Modes

Storing encryption keys alongside encrypted data is the most basic failure, but it still happens regularly, especially in development environments that get promoted to production.

Using encryption without authentication (ECB mode, unauthenticated CBC) allows attackers to modify ciphertext in predictable ways. Always use authenticated encryption modes like AES-GCM or constructions that provide both confidentiality and integrity.

Rolling your own encryption is a well-known anti-pattern, but "rolling your own" also includes choosing parameters, combining primitives, or implementing protocols. Use established libraries with default secure configurations. The crypto/subtle API, libsodium, and AWS Encryption SDK are examples of well-designed interfaces that reduce the chance of misuse.

Key reuse across environments (using production keys in staging) is a subtle but dangerous practice. If staging is compromised, and it often has weaker security controls, the attacker has production keys.

Logging or caching decrypted data is easy to do accidentally. A well-meaning debug log that includes request bodies, a cache layer that stores decrypted values for performance—these create cleartext copies that bypass your encryption strategy entirely.

Operating Encryption in Production

Encryption adds operational complexity that needs to be planned for. Performance monitoring should track encryption and decryption latency. Key rotation should be automated and tested regularly. Key access should be audited continuously.

Build tooling for your team. If developers need to encrypt test data, give them tools that use proper keys and algorithms. If operations teams need to decrypt data for debugging, give them audited, controlled access paths rather than sharing keys informally.

Document your encryption decisions. Which algorithms, which modes, which key sizes, and why. When someone asks why you're using AES-256-GCM instead of ChaCha20-Poly1305, there should be a documented rationale. When standards change, you'll know what needs updating and why the original decision was made.