The CS/ECE 4/599 Course Blog

Whole-System Persistence

by Darren Mai (Blogger), Isaac Lonergan (Presenter), Mykyta "Nick" Syntsia, Sam Shaaban, Shubhangi Pandey, Nat Rurka, Adam Bobich (Scribe)

Introduction

Databases are oftentimes entirely stored in memory to achieve high throughput and low latency. When power fails however, recovery can last minutes for a single server or hours for a full cluster. Whole-System Persistence suggests an alternative solution. Since there is no distinction between persistent and volatile objects, you can restore the entire state using only in-memory objects. This post will go over a 2012 paper by two researchers at Microsoft Research, Cambridge on the afforementioned concept.

Background

Non-Volatile Main Memory (NVRAM)

Traditional Persistence Models

Before this work, two dominant approaches existed:

Block-Based Persistence

Persistent Heaps

Persistent heaps are more efficient than block-based systems, but still incur heavy runtime cost due to synchronous cache flushes.

The Bottleneck

On modern processors, writes sit in CPU caches. To make data durable, systems must explicitly flush cache lines to memory. These flushes are slow and serialize execution. In persistent heaps, they occur on every commit. This is the core performance problem the paper addresses.

Whole-System Persistence (WSP)

Core Idea

Flush-on-Commit vs. Flush-on-Fail

Flush-on-Commit

Flush-on-Fail (WSP)

Key Insights

Architecture Considerations

CPU and Memory State

Device State

Failure Model

NVRAM is useful for recovery from crash failures such as power outages. It does not protect against:

Performance Results

Key Findings

What this means

The results show that getting rid of flush-on-commit yields large gains, and that flush-on-fail has better runtime performance.

Strengths

Weaknesses

Class Discussion

AI Disclosure

Used ChatGPT to summarize paper and get notes. Formatting for github was also done with ChatGPT.