Lesson 14: Disaggregated Memory
We talked about several forms of disaggregated memory:
- The memory blade approach proposed in the ISCA '09 paper (which can use cache-granularity or page-granularity transfers)
- Simple paging-based far memory (see Infiniswap (NSDI '17) and FastSwap (EuroSys '20), where the kernel page fault handler handles network transfers for pages
- CXL memory pooling
For the second approach, note that page size is not always the right answer. Sometimes, if applications prefer small object accesses, this can lead to I/O amplification, where you're transfering more data (e.g., a page) over the network than you actually need. AIFM (OSDI '20) gets around this by using a lightweight user-space runtime and a special remote data structure library. Our work, TrackFM (ASPLOS '24) expands on this idea but makes it easier on developers by having the compiler transform the code automatically. Mira (SOSP '23) does something similar using profiling.
We also spent some time talking about interconnect topologies, both for cluster-scale interconnects, and networks-on-chip (NoCs). A good reference text for this is the Dally and Towles book.
On readings: Recommended background readings are marked with (^) above. Optional historical or fun readings are marked with (*). If you feel confortable with the topic already, you may skip these readings.
You can find my slides here