# Revisiting Isolated and Trusted Execution via Microarchitectural Cryptanalysis

Daniel Moghimi Worcester Polytechnic Institute

**Committee Members:** 

- Prof. Donald R. Brown (Department Head)
- Prof. Thomas Eisenbarth (Co-advisor)
- Prof. Simha Sethumadhavan (External Committee)
- Prof. Berk Sunar (Co-advisor)

Telescond L



December 4, 2020 PhD Defense



Single user, single task





Single user, single task

Multiuser, multitask, several security domains







Secure Channel



- Architectural Isolation
  - Process-level Isolation
  - VM-level Isolation/Virtualization
  - In-process Isolation (Browser, JavaScript)



# Are we good with secure isolation?

# Security Failures - HeartBleed Example

- Vulnerability in OpenSSL Cryptographic Library
- Buffer Overflow Leaking the Private Key
- It affected millions of computers.



- Buffer overflows are well-understood problems for decades.
- The price of a single line of unsanitized code: memcpy(bp, pl, payload)

































# **Cache Attacks and Microarchitectural Security**

- Software-based side-channel Attacks
- A user-level adversary leaks the data or secret of other users.
- Running specially-crafted software that exploits the behavior of the microarchitecture.
- Violating
  - process-level isolation
  - VM-level isolation

- Osvik et al, Cache Attacks and Countermeasures, 2005
- Percival, CACHE MISSING FOR FUN AND PROFIT



- People have proposed ad-hoc Countermeasures, e.g.
  - Randomized Cache Access Pattern
  - Partitioned Cache
  - Constant-cache Access Pattern
  - Detection of Frequent Cache Misses

- People have proposed ad-hoc Countermeasures, e.g.
  - Randomized Cache Access Pattern
  - Partitioned Cache
  - Constant-cache Access Pattern
  - Detection of Frequent Cache Misses
- Countermeasures are either not used or utterly ineffective.
- Why?

- People have proposed ad-hoc Countermeasures, e.g.
  - Randomized Cache Access Pattern
  - Partitioned Cache
  - Constant-cache Access Pattern
  - Detection of Frequent Cache Misses
- Countermeasures are either not used or utterly ineffective.
- Why?



1. Earliness

- People have proposed ad-hoc Countermeasures, e.g.
  - Randomized Cache Access Pattern
  - Partitioned Cache
  - Constant-cache Access Pattern
  - Detection of Frequent Cache Misses
- Countermeasures are either not used or utterly ineffective.
- Why?



1. Earliness



2. Fuzzy Impact

- People have proposed ad-hoc Countermeasures, e.g.
  - Randomized Cache Access Pattern
  - Partitioned Cache
  - Constant-cache Access Pattern
  - Detection of Frequent Cache Misses
- Countermeasures are either not used or utterly ineffective.
- Why?



1. Earliness





2. Fuzzy Impact 3. Expertise & Tooling

1. Uncovering µ-Arch Side Channels

- There are many different type of cache attacks:
  - Flush+Reload (Flush+Flush)
  - Prime+Probe
  - Evict+Reload
- Cache attacks leak memory access patterns of collocated victims with 64-byte granularity.
- Secret-dependent memory accesses leak some information about the secret. Examples:
  - AES: S-Box lookups
  - RSA: Table lookups in fixed-window Montgomery exponentiation

#### **Cache Attacks - Cache Line Resolution**

| Rest of the bits (Virtual != Physical) | <br>Least 12 bits (Virtual Address = |  |  |  |  |  |  |  | Physical Address) |  |  |  |
|----------------------------------------|--------------------------------------|--|--|--|--|--|--|--|-------------------|--|--|--|
|                                        |                                      |  |  |  |  |  |  |  |                   |  |  |  |

#### **Cache Attacks - Cache Line Resolution**



L1 Cache Attacks

#### **Cache Attacks - Cache Line Resolution**



#### **CPU Memory Subsystem**



#### **CPU Memory Subsystem - Address Translation**



#### **CPU Memory Subsystem - Address Translation**



#### **CPU Memory Subsystem - Address Translation**



#### **CPU Memory Subsystem - Store Forwarding**



#### **CPU Memory Subsystem - Store Forwarding**



#### **CPU Memory Subsystem - Store Forwarding**



- Address translation can be expensive.
- 4K Aliasing: Addresses that are 4K apart are assumed dependent.
- The dependency is verified after the execution!
- Re-execution of the load block due to false dependency
  - It causes timing delay and side channel



#### MemJam - Intra Cache Line Resolution





- Conflicted intra-cache line leakage (4-byte granularity)
- Higher time  $\rightarrow$  Memory accesses with the same bit 3 12
- 4 bits of intra-cache level leakage

# Why should we care the improved resolution?

# MemJam - Attacking So-Called Constant Time AES

- Scatter-gather implementation of AES
  - Intel SGX Software Development Kit (SDK) and IPP Cryptography Library
  - 256 S-Box 4 Cache Line
  - Cache independent access pattern



64 Bytes

#### MemJam - Attacking So-Called Constant Time AES



$$index = S^{-1}(c \oplus k) \longrightarrow index < 4.$$
### AES Key Recovery





## **US 7,603,527 B2** RESOLVING FALSE DEPENDENCIES OF SPECULATIVE LOAD INSTRUCTIONS

"an operation X may determine whether the lower portion of the virtual address of a speculative load instruction matches the lower portion of virtual addresses of older store operations" Loosnet Check

"in an embodiment, the load instruction may have its input data forwarded from the store operation from which the load instruction depends at operation" **Store Forwarding** 

••••

"If there is a hit at operation X and a miss at operation Y, ... the physical addresses of the load and the store may be compared at an operation Z" "In one embodiment, if there is a hit at operation X and the physical address of the load or the store operations is not valid, the physical address check at operation Z may be considered as a hit" "In some embodiments, the physical address check at operation Z may use a partial physical address, e.g., base on data stored in the SAB. This makes the checking at operation Z conservative. Accordingly, in some embodiments, a match may occur on a partial address and block..." Finenet Check



### **SPOILER Attack**

**Dependency Resolution** 























# 2. Data Leakage via Automated Synthesis

### **Transient Execution Attacks**

- Date leakage as oppose to access pattern leakage
- Spectre
  - Due to the CPU's branch Predictor.

- Meltdown
  - Due the speculative behavior of the CPU's memory subsystem
  - Data leakage wo/ any assumption about the victim software





### Meltdown

### char secret = \*(char \*) 0xffffff81a0123; printf("%c\n", secret);







### Microarchitecture Data Sampling (MDS)

• Meltdown is fixed but we could steal leak data on the fixed CPU.

char secret = \*(char \*) 0xffwhatevera0123;



### Microarchitecture Data Sampling (MDS)

- Meltdown is fixed but we could steal leak data on the fixed CPU. char secret = \*(char \*) 0xff Whatever a0123;
- Threat Model: Local adversary
  - Exploiting other threads (simultaneous multithreading)

• Exploiting previous process context



SMT



#### **CPU Memory Subsystem - Leaky Buffers**



### Microarchitecture Data Sampling (MDS)

• Meltdown is fixed but we could steal leak data on the fixed CPU.

char secret = \*(char \*) 0xffwhateverla0123;

Context

Switch

- Threat Model: Local adversary
  - Exploiting other threads (simultaneous multithreading)

Context

Switch

• Exploiting previous process context



- Store Buffer (Fallout)
- Line Fill Buffer (ZombieLoad)





Context

Switch

### Challenges with MDS Testing?

- Reproducing attacks is not reliable.
- No public tool to find new variants or to verify hardware patches.
- Impossible to quantify the impact of leakage.

















| Case                                                                                                                   | Preparation                                | Store                                            | Load                               | Name       |
|------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|--------------------------------------------------|------------------------------------|------------|
| 1                                                                                                                      | (access Ø, random instructions)            |                                                  | <b>-&lt;</b> + ⊕ / <sup>™</sup> /⊘ | MLPDS      |
| 2                                                                                                                      | (access Ø, random instructions)            |                                                  | AVX -< + ⊕ / 🛅 / ⊘                 | MLPDS      |
| 3                                                                                                                      | (access Ø, random instructions)            | 10.00                                            | AVX + ᠿ / 鬥 / <×>                  | Medusa     |
| 4                                                                                                                      | (access Ø, random instructions)            | -                                                | AVX 📑 + 🔂 / 🛅 / ⊘ / <×> / ✓        | Medusa     |
| 5                                                                                                                      | -                                          | store (to load)                                  | 융 / 鬥 / <≍> / ✓                    | S2L        |
| 6                                                                                                                      | (rep mov + store, store + fence + load)    | store (to load)                                  | 읍/鬥/<≍>/✓                          | -          |
| 1                                                                                                                      | -                                          | store (4K Aliasing) + ⊕ / ⊕ / ⊘ / <×> / ✓        | ₼/鬥                                | MSBDS      |
| 8                                                                                                                      | -                                          | store (4K Aliasing, to load) + ᠿ / [m / ⊘ / <×>/ | AVX 📑 + 🔂 / 🛅 / ⊘ / <×> / ✓        | MSBDS, S2L |
|                                                                                                                        |                                            | $\checkmark$                                     |                                    |            |
| 9                                                                                                                      | (Sibling on/off)                           | store (random address) + ⊘                       | <b>⊖</b> / < <b>x</b> >            | MSBDS      |
| (10)                                                                                                                   | (Sibling on/off + clflush (store address)) | store (Cache Offset of Load) + 🖉                 | ⊕ / < <b>x</b> >                   | MSBDS      |
| (11)                                                                                                                   | (Sibling on/off + repmov (to Load))        | store (to Load)                                  | AVX 📑 + ᠿ / 鬥 / ⊘ / <×> / ✓        | Medusa,    |
| <u> </u>                                                                                                               |                                            |                                                  |                                    | MLPDS      |
| (12)                                                                                                                   |                                            | Store (Unaligned to Load)                        | <del>ල</del> /                     | Medusa     |
| (13)                                                                                                                   | (random instructions)                      | AVX Store (to Load)                              | <*>                                | Medusa,    |
| $\bigcirc$                                                                                                             |                                            |                                                  |                                    | MLPDS,     |
|                                                                                                                        |                                            |                                                  |                                    | MSBDS      |
| (14)                                                                                                                   |                                            | random fill stores                               | <*>                                | MSBDS      |
| Supervisor Protection Fault AVX Alignment Fault Non-present Page Fault Supervisor Protection Fault XVX Alignment Fault |                                            |                                                  |                                    |            |

#### Table 2: Leakage variants discovered by Transynther.



- Medusa only leaks the write combining data.
- Implicit WC, i.e., 'rep mov', 'rep sto', can be leaked.
  - Memory Copy Routines
  - File IO
- Served by a Write Combining Buffer (or just the Fill Buffer).
- Three variants
  - Based on different ways of massaging the microarchitecture

- OpenSSL Base64 Decoder uses inline Memcpy(-oS)
- Triggered during the RSA Key Decoding from the PEM format:

-----BEGIN RSA PRIVATE KEY-----

MIICXQIBAAKBgQDmTvQjjtGtnlqMwmmaLW+YjbYTsNR8PGKXr78iYwrMV5Ye4VGy BwS6qLD4s/EzCzGIDwkWCVx+gVHvh2wGW15Ddof0gVAtAMkR6gRABy4TkK+6YFSK AyjmHvKCfFHvc9loeFGDyjmwFFkfdwzppXnH1Wwt00lnyCU1GbQ1w7AHuwIDAQAB AoGBAMyDri7pQ29NBIfMmGQuFtw8c0R3EamlIdQbX7qUguFEoe2YHqjdrKho5oZj nDu8o+Zzm5jzBSzdf7oZ4qaeekv0f0+ZSz6CKYLbuzG2IXUB8nHJ7NuH3lacfivD V4Cfg0yFnTK+MDG/xTVqywrCTsslkTCYC/XZOXU5Xt5z32FZAkEA/nLWQhMC4YPM 0LqMtgKzfgQdJ7vbr43WVVNpC/dN/ibUASI/3YwY0uUtqSjillghIY7pRohrPJ6W ntSJw0UAhQJBAOe2b9cfi0TFKXxyU4j315VkulFfTyL6GwXi/7mvpcDCixDLNRyk uRigmdKjtIUrAX0pwjgXa6niqJ691jExez8CQQCcMZZAvTbZhHSn9LwHxqS0SIY1 K+ZxX5ogirFDPS5NQzyE7adSsntSioh6/LQKBX6BAR9FwtxBPACtwz5F9geZAkA8 a3z0SlvG04aC1cjkgUPsx6wxxbl79F2RhmSKRbvh7JiYk3RQ+L7vJgmWPGu5AcLM oVPsjmbbkKfJZNTyVOW/AkABepEi++ZQQW0FXJWZ3nM+2CNcXYCtTgi4bGkvnZPp /1pAy9rjeVJYhb8acTRnt+dU+uZ74CTtfuzUTZLOIuVe -----END RSA PRIVATE KEY-----

- OpenSSL Base64 Decoder uses inline Memcpy(-oS)
- Triggered during the RSA Key Decoding from the PEM format:

-----BEGIN RSA PRIVATE KEY-----

MIICXQIBAAKBgQDmTvQjjtGtnlqMwmmaLW+YjbYTsNR8PGKXr78iYwrMV5Ye4VGy BwS6qLD4s/EzCzGIDwkWCVx+gVHvh2wGW15Ddof0gVAtAMkR6gRABy4TkK+6YFSK AyjmHvKCfFHvc9loeFGDyjmwFFkfdwzppXnH1Wwt0OlnyCU1GbQ1w7AHuwIDAQAB AoGBAMyDri7pQ29NBIfMmGQuFtw8c0R3EamlIdQbX7qUguFEoe2YHqjdrKho5oZj nDu8o+Zzm5jzBSzdf7oZ4qaeekv0f0+ZSz6CKYLbuzG2IXUB8nHJ7NuH3lacfivD V4Cfg0yFnTK+MDG/xTVqywrCTsslkTCYC/XZOXU5Xt5z32FZAkEA/nLWQhMC4YPM 0LqMtgKzfgQdJ7vbr43WVVNpC/dN/ibUASI/3YwY0uUtqSjillghIY7pRohrPJ6W ntSJw0UAhQJBAOe2b9cfi0TFKXxyU4j315VkulFfTyL6GwXi/7mvpcDCixDLNRyk uRigmdKjtIUrAX0pwjgXa6niqJ691jExez8CQQCcMZZAvTbZhHSn9LwHxqS0SIY1 K+ZxX5ogirFDPS5NQzyE7adSsntSioh6/LQKBX6BAR9FwtxBPACtwz5F9geZAkA8 a3z0SlvG04aC1cjkgUPsx6wxxbl79F2RhmSKRbvh7JiYk3RQ+L7vJgmWPGu5AcLM oVPsjmbbkKfJZNTyVOW/AkABepEi++ZQQW0FXJWZ3nM+2CNcXYCtTgi4bGkvnZPp /1pAy9rjeVJYhb8acTRnt+dU+uZ74CTtfuzUTZLOIuVe -----END RSA PRIVATE KEY-----

- OpenSSL Base64 Decoder uses inline Memcpy(-oS)
- Triggered during the RSA Key Decoding from the PEM format:



### **OpenSSL RSA Key Recovery - Coppersmith**

- Knowledge of at least  $^{1}/_{3}$  of P+Q
- Create a n dimensional hidden number problem where n is relative to the number of recovered chunks
- Feed it to the lattice-based algorithm to find the short vector



### **OpenSSL RSA Key Recovery - Coppersmith Attack**

- Knowledge of at least 1/3 of P+Q.
- Creating a n dimensional hidden number problem where n is relative to the number of recovered chunks.
- Feeding it to the lattice-based algorithm to find the short vector.


- MSBDS (Fallout) on Ice Lake
  - November 2019: Intel sent us an Ice Lake Machine
  - March 2019: Tested Transyther on the Ice Lake CPU
  - Mar 27, 2020: Reported MSBDS Leakage on Ice Lake
  - May 5, 2020: Intel Completed triage
    - MDS mitigations are not deployed properly
      - Chicken bits were not enabled for all mitigations.
      - OEMs shipped with old/wrong microcode.
    - Embargoed till July
  - July 13, 2020: MDS advisory and list of affected CPUs were updated.

| Iorsion      | C Date     | arable | Leakage (bytes/s) |          |            |  |
|--------------|------------|--------|-------------------|----------|------------|--|
| MC Vero      | MC         | Vulner | clflush           | lock inc | Unmodified |  |
| 0x32 (stock) | 2019-07-05 | 1      | 577.87            | 754.99   | 1.58       |  |
| 0x36         | 2019-07-18 | 1      | 148.24            | 529.84   | 0.62       |  |
| 0x46         | 2019-09-05 | 1      | 130.15            | 695.80   | 0.11       |  |
| 0x48         | 2019-09-12 | 1      | 271.69            | 620.07   | 0.59       |  |
| 0x50         | 2019-10-27 | 1      | 96.54             | 542.10   | 0.25       |  |
| 0x56         | 2019-11-05 | 1      | 145.46            | 751.40   | 0.08       |  |
| 0x5a         | 2019-11-19 | 1      | 532.40            | 645.32   | 0.70       |  |
| 0x66         | 2020-01-09 | ×      | 0                 | 0        | 0          |  |
| 0x70         | 2020-02-17 | ×      | 0                 | 0        | 0          |  |
| 0x82         | 2020-04-22 | ×      | 0                 | 0        | 0          |  |
| 0x86         | 2020-05-05 | ×      | 0                 | 0        | 0          |  |
|              |            |        |                   |          |            |  |

| 057         | MDS_NO Bit in IA32_ARCH_CAPABILITIES MSR is Incorrectly Set                                                                                                       |
|-------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Problem     | MDS_NO bit (bit 5) in IA32_ARCH_CAPABILITIES MSR (10Ah) is set, incorrectly indicating full activation of all MDS (microarchitectural data sampling) mitigations. |
| Implication | Due to this erratum, the IA32_ARCH_CAPABILITIES MDS_NO bit incorrectly reports the activation of all MDS mitigations actions.                                     |
| Workaround  | It is possible for the BIOS to contain a workaround for this erratum.                                                                                             |
| Status      | For the steppings affected, refer to the Summary Table of Changes.                                                                                                |

#### Table 6: List of MDS-affected processors by Family/Model

| Family_Model | Step | Processor Families /<br>Processor Number Series                                                       | MFBDS | MSBDS | MLPDS |
|--------------|------|-------------------------------------------------------------------------------------------------------|-------|-------|-------|
| 06_7EH       | 5    | 10th Generation Intel® Core™ Processor Family based on <mark>Ice Lake</mark> (U, Y) microarchitecture | No    | Yes   | No    |



3. Hardwarebased Trusted Computing

- We can **not** trust:
  - cloud providers.
  - software developers.
  - OEMs and computer manufacturers.
- Trusted Computing
  - Others can compute on the data without giving them the data.
- Example Applications:
  - Privacy-Preserving machine learning
  - Digital right management (DRM)
  - Anonymous blockchain transactions





Multiuser, multitask, several security domains









#### **Trusted Execution Environment (TEE) - Intel SGX**

• Intel Software Guard eXtensions (SGX)



#### System-level Threat to Trusted Execution Environments (T2)

- Intel Software Guard eXtensions (SGX)
- Enclave: A hardware protected userlevel software module
  - Mapped by the operating system
  - Loaded by the user program
  - Authenticated and encrypted by CPU
- It **must** protect secrets against system-level adversary

#### New Attacker Model:

Attacker gets full control over the OS



### CacheZoom and CacheQuote





#### Intel SGX Attack Taxonomy

#### • Intel's Responsibility

- Microcode Patches / Hardware mitigation
- TCB Recovery
- Hyperthreading is out
  - Remote Attestation Warning



Van Bulck et al. "Foreshadow: Extracting the keys to the intel SGX kingdom with transient out-of-order execution." USENIX Security 2018.
 Murdock et al. "Plundervolt: Software-based fault injection attacks against Intel SGX." IEEE S&P 2020.

#### Intel SGX Attack Taxonomy

#### • Intel's Responsibility

- Microcode Patches / Hardware mitigation
- TCB Recovery
- Hyperthreading is out
  - Remote Attestation Warning
- µarch Side Channel
  - Constant-time Coding
  - Flushing and Isolating buffers
  - Probabilistic



#### Intel SGX Attack Taxonomy

#### • Intel's Responsibility

- Microcode Patches / Hardware mitigation
- TCB Recovery
- Hyperthreading is out
  - Remote Attestation Warning
- µarch Side Channel
  - Constant-time Coding
  - Flushing and Isolating buffers
  - Probabilistic
- Deterministic Attacks
  - Page Fault, A/D Bit, etc. (4kB Granularity)



# Can deterministic attacks do better?

• Malicious OS controls the interrupt handler

| NOP                                      | ADD | XOR | MUL | DIV | ADD | MUL | NOP | NOP |      |
|------------------------------------------|-----|-----|-----|-----|-----|-----|-----|-----|------|
| Enclave<br>Execution<br>Thread<br>Starts |     |     |     |     |     |     |     |     | Time |

- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions



- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions



- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions



- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions



- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions





- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions
- Filtering Zeros out: Clear the A bit before, Check the A bit after







- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions
- Filtering Zeros out: Clear the A bit before, Check the A bit after
- Deterministic Instruction Counting

- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions
- Filtering Zeros out: Clear the A bit before, Check the A bit after
- Deterministic Instruction Counting
- Counting from start to end is not useful.
  - A Secondary oracle
  - Page table attack as a deterministic secondary oracle



Time

- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions
- Filtering Zeros out: Clear the A bit before, Check the A bit after
- Deterministic Instruction Counting
- Counting from start to end is not useful.
  - A Secondary oracle
  - Page table attack as a deterministic secondary oracle



- Malicious OS controls the interrupt handler
- A threshold to execute 1 or 0 instructions
- Filtering Zeros out: Clear the A bit before, Check the A bit after
- Deterministic Instruction Counting
- Counting from start to end is not useful.
  - A Secondary oracle
  - Page table attack as a deterministic secondary oracle



- Previous controlled-channel attacks leak page access patterns.
- CopyCat additionally leaks number of executed instructions per each page.



#### **CopyCat - Leaking Branches**





C Code

#### Binary Extended Euclidean Algorithm (BEEA)

• Previous attacks only leak some of the branches w/ some noise.

1: **procedure** MODINV(*u*, modulus *v*)

2: 
$$b_i \leftarrow 0 \ d_i \leftarrow 1, u_i \leftarrow u, v_i = v,$$
  
3: while  $isEven(u_i)$  do  
4:  $u_i \leftarrow u_i/2$   
5: if  $isOdd(b_i)$  then  
6:  $b_i \leftarrow b_i - u$   
7:  $b_i \leftarrow b_i/2$   
8: while  $isEven(v_i)$  do  
9:  $v_i \leftarrow v_i/2$   
0: if  $isOdd(d_i)$  then  
1:  $d_i \leftarrow d_i - u$   
2:  $d_i \leftarrow d_i/2$   
3: if  $u_i > v_i$  then  
4:  $u_i \leftarrow u_i - v_i, b_i \leftarrow b_i - d_i$   
5: else  
6:  $v_i \leftarrow v_i - u_i, d_i \leftarrow d_i - b_i$   
7:  $d_i \leftarrow d_i - d_i$ 

return d<sub>i</sub>

#### Binary Extended Euclidean Algorithm (BEEA)

- Previous attacks only leak some of the branches w/ some noise.
- CopyCat synchronously leaks all the branches wo/ any noise.



| 1:  | <b>procedure</b> MODINV( <i>u</i> , modulus <i>v</i> )            |
|-----|-------------------------------------------------------------------|
| 2:  | $b_i \leftarrow 0 \ d_i \leftarrow 1, u_i \leftarrow u, v_i = v,$ |
| 3:  | while $isEven(u_i)$ do                                            |
| 4:  | $u_i \leftarrow u_i/2$                                            |
| 5:  | if $isOdd(b_i)$ then                                              |
| 6:  | $b_i \leftarrow b_i - u$                                          |
| 7:  | $b_i \leftarrow b_i/2$                                            |
| 8:  | while $isEven(v_i)$ do                                            |
| 9:  | $v_i \leftarrow v_i/2$                                            |
| 10: | if $isOdd(d_i)$ then                                              |
| 11: | $d_i \leftarrow d_i - u$                                          |
| 12: | $d_i \leftarrow d_i/2$                                            |
| 13: | if $u_i > v_i$ then                                               |
| 14: | $u_i \leftarrow u_i - v_i, b_i \leftarrow b_i - d_i$              |
| 15: | else                                                              |
| 16: | $v_i \leftarrow v_i - u_i, d_i \leftarrow d_i - b_i$              |
| 17: | , , ,                                                             |
|     | return di                                                         |

- Single-trace attack during RSA key generation:  $q_{inv} = q^{-1} \mod p$ 
  - We know that  $\mathbf{p} \cdot \mathbf{q} = \mathbf{N}$ , and  $\mathbf{N}$  is public

- Single-trace attack during RSA key generation:  $q_{inv} = q^{-1} \mod p$ 
  - We know that  $\mathbf{p} \cdot \mathbf{q} = \mathbf{N}$ , and  $\mathbf{N}$  is public
  - Branch and prune algorithm with the help of the recovered trace



- Single-trace Attack during RSA Key Generation:  $q_{inv} = q^{-1} \mod p$ 
  - We know that  $\mathbf{p} \cdot \mathbf{q} = \mathbf{N}$ , and  $\mathbf{N}$  is public
  - Branch and prune algorithm with the help of the recovered trace



- Single-trace Attack during RSA Key Generation:  $q_{inv} = q^{-1} \mod p$ 
  - We know that  $\mathbf{p} \cdot \mathbf{q} = \mathbf{N}$ , and  $\mathbf{N}$  is public
  - Branch and prune algorithm with the help of the recovered trace



- Single-trace Attack during RSA Key Generation:  $q_{inv} = q^{-1} \mod p$ 
  - We know that  $\mathbf{p} \cdot \mathbf{q} = \mathbf{N}$ , and  $\mathbf{N}$  is public
  - Branch and prune algorithm with the help of the recovered trace



- Single-trace Attack during RSA Key Generation:  $q_{inv} = q^{-1} \mod p$ 
  - We know that  $\mathbf{p} \cdot \mathbf{q} = \mathbf{N}$ , and  $\mathbf{N}$  is public
  - Branch and prune algorithm with the help of the recovered trace



#### Benefits of CopyCat compared to Previous Attacks

- Instruction level granularity
  - Imbalance number of instructions
  - Leak the outcome of branches
- Fully deterministic and reliable
  - Millions of instructions tested
- Easy to scale and replicate
  - No reverse engineering of branches and microarchitectural components
  - Tracking all the branches synchronously



## 5. Physically Isolated Security Elements

#### **Beyond TEEs - Physical Isolation**



#### **Beyond TEEs - Physical Isolation**


## Trusted Platform Module (TPM)

- Security chip for computers?
- Tamper and Side-Channel Resistant
- Cryptographic Co-processor
- Standardized by TCG, it supports
  - hash functions
  - encryption
  - digital signatures

• ••



## Physical Threats to TPM

• Our work focuses on Timing Attack





## **High-resolution Timing Test**

- TPM frequency ~= 32-120 MHz
- CPU Frequency is more than 2 GHz



## High-resolution Timing Test - Intel PTT (fTPM)

- Intel Platform Trust Technology (PTT)
  - Integrated firmware-TPM inside the CPU package



## High-resolution Timing Test - Intel PTT (fTPM)

- Intel Platform Trust Technology (PTT)
  - Integrated firmware-TPM inside the CPU package
- Kernel Driver to increase the Resolution



- Intel fTPM: 4-bit Window Nonce Length Leakage
  - ECDSA
  - ECSChnorr
  - BN-256 (ECDAA)

ECDSA Sign:  $(x_1, y_1) = k_i \times G$   $r_i = x_1 \mod n$  $s_i = k_i^{-1}(z + r_i d) \mod n$ 

## Nonce

0101000100111111...111

0000100100111111...111

1101000100111111...111

000000000111111...111

```
00000000001111...111
```

4.67

4.76

► t

- Intel fTPM: 4-bit Window Nonce Length Leakage
  - ECDSA
  - ECSChnorr
  - BN-256 (ECDAA)

ECDSA Sign:  $(x_1, y_1) = k_i \times G$   $r_i = x_1 \mod n$  $s_i = k_i^{-1}(z + r_i d) \mod n$ 

#### Nonce

► t

- Intel fTPM: 4-bit Window Nonce Length Leakage
  - ECDSA
  - ECSChnorr
  - BN-256 (ECDAA)

ECDSA Sign:  $(x_1, y_1) = k_i \times G$   $r_i = x_1 \mod n$  $s_i = k_i^{-1}(z + r_i d) \mod n$ 

#### Nonce



4.67 4.72 4.76 4.8 4.84

- Intel fTPM: 4-bit Window Nonce Length Leakage
  - ECDSA
  - ECSChnorr
  - BN-256 (ECDAA)

ECDSA Sign:  $(x_1, y_1) = k_i \times G$   $r_i = x_1 \mod n$  $s_i = k_i^{-1}(z + r_i d) \mod n$ 

#### Nonce



4.67 4.72 4.76 4.8 4.84

- Intel fTPM: 4-bit Window Nonce Length Leakage
  - ECDSA
  - ECSChnorr
  - BN-256 (ECDAA)

ECDSA Sign:  $(x_1, y_1) = k_i \times G$   $r_i = x_1 \mod n$  $s_i = k_i^{-1}(z + r_i d) \mod n$ 

#### Nonce



118

- Intel fTPM: 4-bit Window Nonce Length Leakage
  - ECDSA
  - ECSChnorr
  - BN-256 (ECDAA)

ECDSA Sign:  $(x_1, y_1) = k_i \times G$   $r_i = x_1 \mod n$  $s_i = k_i^{-1}(z + r_i d) \mod n$ 

#### Nonce



#### High-resolution Timing Test - Analysis Of Devices

- RSA and ECDSA timing test on 3 dedicated TPM and Intel fTPM
- Various non-constant behaviour for both RSA and ECDSA

| Machine             | CPU            | Vendor   | TPM           | Firmware/Bios   |
|---------------------|----------------|----------|---------------|-----------------|
| NUC 8i7HNK          | Core i7-8705G  | Intel    | PTT (fTPM)    | NUC BIOS 0053   |
| NUC 7i3BNK          | Core i3-7100U  | Intel    | PTT (fTPM)    | NUC BIOS 0076   |
| Asus GL502VM        | Core i7-6700HQ | Intel    | PTT (fTPM)    | Latest OEM      |
| Asus K501UW         | Core i7 6500U  | Intel    | PTT (fTPM)    | Latest OEM      |
| Dell XPS 8920       | Core i7-7700   | Intel    | PTT (fTPM)    | Dell BIOS 1.0.4 |
| Dell Precision 5510 | Core i5-6440HQ | Nuvoton  | rls NPCT      | NTC 1.3.2.8     |
| Lenovo T580         | Core i7-8650U  | STMicro  | ST33TPHF2ESPI | STMicro 73.04   |
| NUC 7i7DNKE         | Core i7-8650U  | Infineon | SLB 9670      | NUC BIOS 0062   |

#### **TPM-Fail - Recovering Private ECDSA Key**

- TPM is programmed with an unknown key.
- We already have a template for  $t_i$ .
- Attack Steps:
- 1. Collect list of signatures  $(r_i, s_i)$  and timing samples  $t_i$ .
- 2. Filter signatures based on  $t_i$  and keeps  $(r_i, s_i)$  with a known bias.
- 3. Lattice-based attack to recover private key d, from signatures with biased nonce  $k_i$ .

• 
$$s = k^{-1}(z + dr) \mod n \to k_i^{-1} - s_i^{-1}r_id - s_i^{-1}z \equiv 0 \mod n$$

- $s = k^{-1}(z + dr) \mod n \to k_i^{-1} s_i^{-1}r_id s_i^{-1}z \equiv 0 \mod n$
- $A_i = -s_i^{-1}r_i, B_i = -s_i^{-1}z \rightarrow k_i + A_id + B_i = 0$

- $s = k^{-1}(z + dr) \mod n \to k_i^{-1} s_i^{-1}r_id s_i^{-1}z \equiv 0 \mod n$
- $A_i = -s_i^{-1}r_i, B_i = -s_i^{-1}z \rightarrow k_i + A_id + B_i = 0$
- Let X be the upper bound on  $k_i$  and  $(d, k_0, k_1, \dots, k_n)$  is unknown

Boneh and Venkatesan[1]

[1] Boneh D, Venkatesan R. Hardness of computing the most significant bits of secret keys in Diffie-Hellman and related schemes. InAnnual International Cryptology Conference 1996 Aug 18 (pp. 129-142). Springer, Berlin, Heidelberg.

- $s = k^{-1}(z + dr) \mod n \to k_i^{-1} s_i^{-1}r_id s_i^{-1}z \equiv 0 \mod n$
- $A_i = -s_i^{-1}r_i, B_i = -s_i^{-1}z \rightarrow k_i + A_id + B_i = 0$
- Let X be the upper bound on  $k_i$  and  $(d, k_0, k_1, \dots, k_n)$  is unknown
- Lattice Construction:



## **TPM-Fail - Key Recovery Results**

- Intel fTPM
  - ECDSA, ECSchnorr and BN-256 (ECDAA)
  - Three different threat model System, User, Network
- STMicroelectronics TPM
  - CC EAL4+ Certified

| Threat Model | TPM    | Scheme    | #Sign. | Time    |
|--------------|--------|-----------|--------|---------|
| Local System | ST TPM | ECDSA     | 39,980 | 80 mins |
| Local System | fTPM   | ECDSA     | 1,248  | 4 mins  |
| Local System | fTPM   | ECSchnorr | 1,040  | 3 mins  |
| Local User   | fTPM   | ECDSA     | 15,042 | 18 mins |



#### TPM-Fail Case Study: StrongSwan VPN



#### TPM-Fail Case Study: StrongSwan VPN

• Stealing private keys remotely after 44,000 handshake ~= 5 hours



- Improved understanding of the side-channel attack surface:
  - Software-based side-channel attacks are practical.
  - Future CPUs and cryptographic software are more secure.

- Improved understanding of the side-channel attack surface:
  - Software-based side-channel attacks are practical.
  - Future CPUs and cryptographic software are more secure.
- Proper threat modeling is crucial
  - These attacks apply across many different threat models.
  - Vulnerabilities occur because of porting a previous design to a different threat model, e.g. Intel SGX, Cryptographic Implementations

- Automated testing for CPU attacks (Transynther)
  - helps us to understand the root cause and impact of these issues better.
  - can be used to verify hardware mitigations.

- Automated testing for CPU attacks (Transynther)
  - helps us to understand the root cause and impact of these issues better.
  - can be used to verify hardware mitigations.
- Automated testing of software (MicroWalk)
  - helps us to identify vulnerable code at scale
  - reduces analysis effort for software security

- Automated testing for CPU attacks (Transynther)
  - helps us to understand the root cause and impact of these issues better.
  - can be used to verify hardware mitigations.
- Automated testing of software (MicroWalk)
  - helps us to identify vulnerable code at scale
  - reduces analysis effort for software security
- Hardware and software security are not separate problems.
  - covers cryptography, computer architecture and systems security.

## **Summary of Contributed Publications**

- 1) D Moghimi, B Sunar, T Eisenbarth, N Heninger. "TPM-Fail: TPM meets Timing and Lattice Attacks" <u>USENIX Security 2020</u>.
- 2) D Moghimi, M Lipp, B Sunar, M Schwarz. "Medusa: Microarchitectural Data Leakage via Automated Attack Synthesis" <u>USENIX Security 2020</u>.
- 3) D Moghimi, J Van Bulck, N Heninger, F Piessens, B Sunar. "CopyCat: Controlled Instruction-Level Attacks on Enclaves" <u>USENIX Security 2020</u>.
- 4) Z Weissman, T Tiemann, **D Moghimi**, E Custodio, T Eisenbarth, B Sunar. "JackHammer: Efficient Rowhammer on Heterogeneous FPGA-CPU Platforms" <u>TCHES 2020</u>.
- 5) J Van Bulck, **D Moghimi**, M Schwarz, M Lipp, M Minkin, D Genkin, Y Yarom, B Sunar, D Gruss, F Piessens."LVI: Hijacking Transient Execution through Microarchitectural Load Value Injection" <u>IEEE S&P 2020</u>.
- 6) C Canella, D Genkin, L Giner, D Gruss, M Lipp, M Minkin, D Moghimi, F Piessens, M Schwarz, B Sunar, J Van Bulck. "Fallout: Leaking Data on Meltdown-resistant CPUs" <u>CCS 2019</u>.
- 7) M Schwarz, M Lipp, **D Moghimi**, J Van Bulck, J Stecklina, T Prescher, D Gruss. "ZombieLoad: Cross-Privilege-Boundary Data Sampling" <u>CCS 2019</u>.

- 8) S Islam, A Moghimi, I Bruhns, M Krebbel, B Gulmezoglu, T Eisenbarth, B Sunar. "SPOILER: Speculative Load Hazards Boost Rowhammer and Cache Attacks" <u>USENIX Security 2019</u>.
- 9) A Moghimi, J Wichelmann, T Eisenbarth, B Sunar. "MemJam: A False Dependency Attack against Constant-Time Crypto Implementations" (Extended Version) <u>IJPP 2019</u>.
- 10) J Wichelmann, **A Moghimi**, T Eisenbarth, B Sunar. "MicroWalk: A Framework for Finding Side Channels in Binaries" <u>ACSAC</u> <u>2018.</u>
- 11) F Dall, G De Micheli, T Eisenbarth, D Genkin, N Heninger, A Moghimi, Y Yarom. "CacheQuote: Efficiently Recovering Longterm Secrets of SGX EPID via Cache Attacks" <u>TCHES 2018</u>.
- **12) A Moghimi**, T Eisenbarth, B Sunar. "MemJam: A False Dependency Attack against Constant-Time Crypto Implementations in SGX" <u>CT-RSA 2018</u>.
- **13)** A Moghimi, G Irazoqui, T Eisenbarth. "CacheZoom: How SGX Amplifies The Power of Cache Attacks" <u>CHES 2017</u>.

# **Coordinated Disclosure**



#### **Crpytographic Libraries**

Intel IPP (CVE-2018-12155, CVE-2018-3691) WolfSSL (CVE-2019-1996{0-3}) OpenSSL and Libgcrypt (No CVE available).



#### **Trusted Platform Modules**

Intel fTPM (CVE-2019-11090) STMicrolectronics (CVE-2019-16863)



#### **Intel CPUs**

Fallout (CVE-2018-12126) SPOILER (CVE-2019-0162) MemJam (No CVE)

## Acknowledgements

Collaborators













• Sponsors



## THANKS

#### • Questions?

#### The **A** Register

#### {\* SECURITY \*}

Don't trust the Trusted Platform Module – it may leak your VPN server's private key (depending on your configuration)

You know what they say: Timing is... everything

Tue 12 Nov 2019 // 19:43 UTC

Thomas Claburn in San Francisco BIO EMAIL TWITTER

SHARE

19 GOT TIPS?

Ξ



# Authors Slideshows Video Tech Library University Authors Slideshows Video Tech Library University THE EDGE ANALYTICS ATTACKS / BREACHES APP SEC CLOUD ENI RISK THREAT INTELLIGENCE VULNS / THREATS



#### TPM-Fail: What It Means & What to Do About It

Trusted Platform Modules are well-suited to a wide range of applications, but for the strongest security, architect them into "defense-in-depth" designs.



Billions of Malicious Bot Attacks Take to Cipher-Stunting to Hide

Google Titan Se

#### Intel ZombieLoad Side-Channel Attack: 10 Takeaways



#### Home > Blog

#### TPM-Fail Attacks Against Cryptographic Coprocessors

Really interesting research: <u>TPM-FAIL: TPM meets Timing</u> and <u>Lattice Attacks</u>, by Daniel Moghimi, Berk Sunar, Thomas Eisenbarth, and Nadia Heninger.

> Abstract: Trusted Platform Module (TPM) serves as a hardware-based root of trust that protects cryptographic keys from privileged system and physical adversaries. In this work, we per-form a black-box timing analysis of TPM 2.0 devices deployed on commodity computers. Our analysis reveals that some of these devices feature secretdependent execution times during signature generation based on elliptic curves. In particular, we discovered timing leakage on an Intel firmware-based TPM as well as a hardware TPM. We show how this information allows an attacker to apply lattice.





About Bruce Schneier



I am a public-interest technologist, working at the intersection of

#### Forbes

#### 50,350 views | Mar 5, 2019, 05:14pm EST

#### New Intel CPU Vulnerability Bodes Well For AMD



**Ken Kam** Former Contributor ③ Investing

#### TWEET THIS

Intel processors are vulnerable to an attack, nicknamed Spoiler, to which AMD processors are immune