Cubist Blog - Intel SGX is broken (again)

Secure hardware can help protect cryptographic keys from attackers, third-party vendors, and even insider threats. Unfortunately, last week's disclosure of Downfall, the latest in a long line of SGX vulnerabilities, shows that not all “secure hardware” is actually secure.

If your key manager relies on SGX, for example, it may not be keeping your secret keys secret. Attackers could have exploited Downfall to bypass SGX's protections and recover keys anytime in the last few years; if not Downfall, attacks like Foreshadow and AEpicLeak could have accomplished the same thing.

At Cubist, we've avoided complex secure hardware like SGX in favor of lightweight TPM implementations. To understand why—and why you might want to avoid more complex hardware, too—we'll give a brief introduction to SGX, its security problems, and their underlying causes. Then, we'll describe the secure hardware we use at Cubist, our reasons for using it, and the security properties it provides.

What is SGX?

Intel's Software Guard eXtensions (SGX) extend the CPU with features intended to allow security-sensitive software to run inside of a logically isolated computing environment.

SGX was released in 2015 with the Skylake architecture and, at least initially, seemed like a giant leap for secure computing. Dozens of academic papers described applications of SGX to private analytics, network firewalls, databases, blockchains, and more, and commercial products soon followed.

Unfortunately, this enthusiasm was short lived: as security experts dug into the design, it became clear that SGX had promised much more than it could deliver. Nowadays, SGX is widely regarded in the security community as a failure. Attacks on SGX have become so common that they're almost boring—and as a result, only the cleverest or most devastating attacks (e.g., Downfall) are considered important research results.

What was SGX supposed to do?

To understand where things went off the rails, it's important to start with SGX's security properties—what it tries to guarantee—and its threat model—which attackers it protects against.

In theory, SGX is designed to allow security-sensitive code to be completely isolated, even from the operating system. This means SGX should let any user run code that remains correct and private, even if the computer's administrator actively attempts to interfere. This sounds really exciting! Unfortunately, it also falls apart in practice.

The problems with SGX start with its threat model: SGX does not attempt to protect against side-channel attacks, i.e., attacks in which a bad actor extracts information from a victim program indirectly by observing side effects (e.g., by observing the victim's sequence of memory accesses). It also does not attempt to defend against physical attacks (e.g., those that measure power consumption or cause glitches in the victim program's execution).

Although these caveats may seem innocuous, looks are deceiving. First, programs that process secret data like cryptographic keys are especially sensitive to side-channel attacks—and the strongest possible side-channel attacker is an adversarial operating system. Thus, to defend against a malicious OS in practice, programmers can't rely on SGX; they have to do error-prone code hardening themselves, which is (at best) an open research question.

Second, the fact that SGX does not defend against a wide range of physical attacks means that it does not actually protect sensitive applications from a malicious cloud operator. While this is not a problem per se—you certainly shouldn't use a cloud operator whom you suspect of being malicious!—there seems to be widespread misunderstanding on this point. Put simply: even if we ignore its long history of security failures—see the next section—using SGX does not eliminate trust in your operating system or cloud provider.

About those security failures...

Since 2018, nearly a dozen attacks on SGX have been described by security researchers. The upshot is that, to date, Intel has never shipped an SGX-enabled chip that was actually secure. An excellent high-level summary of the problems plaguing SGX is presented in the SGX.fail paper.

The most important thing to understand is that many of these attacks completely break the SGX security model. As one example, the Foreshadow work released in 2018 allowed attackers to extract the keys that SGX uses for data privacy and code integrity—and, as a result, to steal secrets or even masquerade as an SGX-enabled application. Foreshadow's results are not unique: every SGX-enabled processor Intel has shipped since 2015 has been (completely) compromised.

2023 continues the trend: last week, one of our academic collaborators Daniel Moghimi announced Downfall, a new micro-architectural attack that allows an attacker to bypass SGX—and every other isolation boundary—on Intel Core processors across five generations. His BlackHat talk and USENIX paper describe the attack in detail.

To their credit, Intel designed a mechanism by which some, though not all, vulnerabilities can be patched via BIOS updates. In principle this is great; in practice, the patching process is extremely slow. The SGX.fail paper shows that it's common for SGX applications to run on unpatched machines because the timeline from discovery to patch is generally a year or more. The combination of lengthy embargoes on new attacks, slow patch delivery, and long delays in BIOS patches from vendors mean that real-world SGX applications have historically been vulnerable to one or more vulnerabilities that are known but still under wraps.

Why does this keep happening?

When a system's security posture turns into a game of whack-a-mole, it's good to step back and ask: what are the root causes, and can they be fixed?

When we were considering secure hardware implementations for CubeSigner, our key management product, we asked ourselves and our colleagues—who have discovered a large number of hardware attacks—exactly this question. Together, we came up with two related answers:

SGX simply tries to do too much. As other researchers pointed out from the start, the system is incredibly complex, which means that there are too many details that are too difficult to get right. The series of attack papers that followed speak for themselves. As CPUs get increasingly more complex, this trend is not likely to change.
SGX tries to do everything inside the processor. It has no physical isolation from the rest of the CPU, which yields both a massive attack surface and a powerful toolbox for attackers. In contrast, systems like Nitro Cards and classical TPMs physically isolate the security processor from the main CPU and connect the two only over a narrow, well-defined interface. This has long been the gold standard for secure systems design; see, for example, the classic work by Rushby on Separation Kernels.

Cubist's secure hardware strategy

At Cubist, secure hardware is part of the foundation of our key management infrastructure. When evaluating secure hardware technologies, we ask ourselves two questions:

What properties should we rely on secure hardware to provide? And, just as importantly, what should we avoid asking secure hardware to do?
Which technologies provide us the properties we need while giving us high confidence that they will withstand attacks? Implicitly, this means we want to use the simplest and most mature technology we can.

Answering the first question: we write all cryptography-related code in safe Rust, and we carefully vet our software with a combination of formal verification and third-party audits to ensure that the software does not leak secrets. Thus, the property we need from the secure hardware is ensuring that only this vetted and approved code can ever touch secret keys. Specifically: if anyone attempts to access secrets with non-approved code, that non-approved code is cryptographically locked out.

The only other functionality we need from our secure hardware solution is remote attestation—the ability to cryptographically certify to our users that the correct software is being used to process their keys. Importantly, we do not rely on complex secure hardware to provide isolation from adversarial code. Instead, we physically isolate sensitive systems from potential attackers, preventing transient execution and side channels attacks by construction.

To build a secure key manager, then, we do not need SGX-like functionality. The functionality we do need is simple, mature, and well understood, and has been available as part of the Trusted Platform Module specification for nearly three decades.

The functionality we need is also implemented by the AWS Nitro system, which has two other significant upsides: first, the Nitro Security Module, the backbone of the Nitro Enclaves system, is physically isolated from the CPU and accessible only over a narrow interface—the gold standard in secure systems design, as we mentioned above. Second, and just as important, the Nitro system gives simple guarantees, not a (broken) kitchen sink!

Still, we're always looking to improve the security of our key manager. This means pushing forward the verification efforts we mentioned above. It also means teaming up with research groups in academia and industry to further analyze the security of our both our system and the secure hardware on which it's built. We're excited to share more details as we continue on this journey!

About

Intel SGX is broken (again)

What is SGX?

What was SGX supposed to do?

About those security failures...

Why does this keep happening?

Cubist's secure hardware strategy

Blog & Updates

Explore Related Blog Posts