Lesson 11: Distributed Shared Memory

discussion thread
* Memory coherence in shared virtual memory systems
Kai Li and Paul Hudak, PODC '86
* Tread Marks: Distributed Shared Memory on Standard Workstations and Operating Systems
Keleher et al., USENIX Winter TC '94
* Latency-Tolerant Software Distributed Shared Memory
Nelson et al., USENIX ATC '15
* Concordia: Distributed Shared Memory with In-Network Cache Coherence
Wang et al., FAST '21
* GiantVM: A Novel Distributed Hypervisor for Resource Aggregation with DSM-aware Optimizations
Jia et al., ACM TACO '22

On readings: Recommended background readings are marked with (^) above. Optional historical or fun readings are marked with (*). If you feel comfortable with the topic already, you may skip these readings.

Notes

DSM systems typically share data at the level of pages, contrasted with the cache line size. This is because we need to amortize the (relatively) large costs of communicating over a loosely coupled newtork. These costs suggest the use of large pages, but there is a problem…

We’ve talked about this before, but any time we share data in chunks, there is a possibility of two threads/processes accessing distinct data in separate parts of the chunk. In a system that has to enforce coherence, such false sharing confounds the coherence protocol, generating unnecessary traffic on the interconnect.

Restricting shared memory

Note that in IVY, not all memory can be shared across nodes. Some portion of the address space (in particular, the low portion), is kept local, ensuring fast access. For example, the executable of processes is kept in local memory. However, the stack is not. The PCB is kept private.

Note on terminology

The authors use the term eventcount to describe a synchronization primitive. You will hopefully recognize this as what we’d today call a counting semaphore.

Lesson 11: Distributed Shared Memory

Notes

Page granularity sharing

False sharing

Restricting shared memory

Note on terminology