The CS/ECE 4/599 Course Blog

Flipping Bits in Memory Without Accessing Them

by William Davis (Leader Presentor), Carlos Alvarado-Lopez (Presentor), James Tappert (Scribe), Paul Suvrojyoti (Blogger), Kabir Vidyarthi(Blogger)

Introduction

First let us start with the physical structure of a DRAM. At the lowest level Dram is a two dimensional Grid of Cells and each Cell stores either 0 or 1. The cells are made of two components i) Capacitor - the one that holds the electrical charge ii) Access Transistor - these act as switches that lock/release the charge in the capacitors.

The Grid:
The Access:

Computer cannot access a single cell directly, it has to get an entire row of data first, and memory controller controls this process. To read a cell, the memory controller issues an activate command and this triggers(high voltage) the voltage of a specific wordline, which turns on all the access transistors of that row, connecting each capacitor to its corresponding bitline.

But before we go further we need to understand what happens when Dram reads first

Once the read is done, the row values are in the row buffer and cell values can be accessed quickly.

Dram Refresh:

Since the cells are capacitors, they tend to leak and lose charge overtime, so Dram needs to be refreshed (restoring its charge to is orginal or threshold value)

Row Hammer Mechanism and Error:

Now that we know how a Dram access happens, let us see what rowhammer error is. Memory chips are getting smaller and smaller and more cells are getting crammed up into smaller spaces, and because of this tight packaging, interference is happening (electromagnetic coupling). And also since the cells are getting smaller too for more density, they hold less charge therefore losing or leaking charge and going below threshold is easier.

The Authors found out that if they access a row repeatedly which will mean toggling the voltage of the wordline on and off rapidly, it creates a electrical noise and this noise causes the cell in the neighboring rows to leak charge much faster than normal. They will end up leaking and losing charge and going below the threshold before the DRAM gets refreshed, which means permanent loss of data or flips (1 to 0).

The wrote a simple program to prove this happens on real Intel and AMD processors. It showed a simple user level program can induce flips by:

The key trick is evicting cache lines between accesses, so each read becomes a DRAM access. For making this happen they used an instruction called clflush (Cache Line Flush). They showed that alternately touching two addresses mapped to different rows in the same bank forces the controller into a pattern like open > row X > close > open row Y > close > repeat.

They tested 129 DDR3 modules from 3 major manufacturers and 110 were found with this bit flipping error disturbance. All modules from 2012 to 2013 showed this problem but the older ones did not, which shows as technology got smaller this became worse.

Key Findings:

This findings were alarming as memory is assigned to programs as pages. So one row might belong to the web browser(aggressor) and the row next to it might belong to the OS(victim) and by hammering the memory, a malicious website could flip bits in the OS memory and take control of the system.

Solution:

  i) Every time the controller closes a row R it flips a biased coin with probability p (refresh)

  ii) If the coin hits, the controller refreshes on adjacent row (it could be R-1 or R+1, best is to do it alternatively)

  iii) If it misses, do nothing.

Why PARA works - They chose p to be small but large enough so that the chance of no neighbour refresh during heavy hammering becomes almost 0. Let us say a program tries to hammer a row 139k times before the next default refresh, statistically the coin flip will hit at least once and an unofficial will happen at least once which prevents the error.

This is cheap and very effective with very low failures.

Results:

So we can see that the faliure probabilities are very very tiny. They simulated 29 worklaods on a modeled system(4 GHz , dual cahnnel DDR 1600) and also assumed a row can have upti 10 neighbours as a result they increased the p five times to 0.005. The average throughput degradations were 0.197% and worst case was 0.745%. This shows how reliable PARA is and it also casues a very small performance impact.

Class discussion

Conclusion

This work shows that Rowhammer is not just a theoretical error. It constitutes a real hardware-level reliability and security vulnerability emerging from an increase in DRAM density. By explaining the mechanism through which DRAM is accessed, the paper makes clear how repeated row activations can induce charge loss and cause bit fliips in neighboring rows. Among proposed mitigations for these issues, PARA stands out for being practical and having low-overhead, both reducing the error rate and making it less deterministic, and thus less exploitable. A key takeaway to these findings is that memory can no longer be treated as a prefectly reliable abstraction, future systems must take into account both hardware and software defenses to ensure both correctness and memory safety.

References

[1] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. 2014. "Flipping bits in memory without accessing them: an experimental study of DRAM disturbance errors". SIGARCH Comput. Archit. News 42, 3 (June 2014), 361–372. https://doi.org/10.1145/2678373.2665726

AI Disclosure