The CS/ECE 4/599 Course Blog

nuKSM: NUMA-aware Memory De-duplication on Multi-socket Servers

by Allen Lee(Leader / Presentor), Jared Ho(Scribe), Deptmer Ashley(Scribe), John Aebi(Blogger), Brian Castellon Rosales(Blogger)

Introduction

Memory de-duplication is an important aspect of Linux’s Kernel memory management system. Where pages are being scanned in main memory for pages with duplicate content. When two pages are found with the same contents, one file will remain unchanged where the other file will be mapped to the same physical address. This releases or frees the extra physical pages to be allocated for other needs. When two virtual addresses share a physical address both pages are marked as “copy-on-write” where the kernel will remap the virtual address to have its own copy once the process has decided to write to the virtual address. This was implemented to run more virtual machines on a host by sharing memory between users and processes.

This paper nuKSM: NUMA-aware Memory De-duplication on Multi-socket Servers” proposes a new way of memory deduplication by making “nuKSM” which is a NUMA “Non-uniform-memory-access” aware. The goal of this paper is to create a KSM implementation that equally spreads the “NUMA-tax” across all NUMA nodes. Instead of making an arbitrary decision of where to consolidate memory, the nuKSM makes a decision of which process to consolidate the data based on the priority of that node.

Background and Motivation

KSM is very good at what it is meant to do. It effectively can save memory through de-duplication in a cost-effective way. Where it struggles is on multi-core processes such as servers. This often will not work to the same extent on multi-socket processes because it is unaware of what CPU’s are close to what memory and the priority of each CPU comparative to others in a system. This leads to:

Overall this paper goes into detail about how these problems can be minimized by using a NUMA aware model. This can spread out where memory is being accessed. Memory accessed more frequently by a certain core will be placed in closer proximity to said core.

Concepts and Definitions

Implementation

The nuKSM has 3 main goals to achieve when implementing this on top of the already existing KSM in Linux kernel version 5.4.0:

Evaluation and Results

The authors conducted a study on a dual-socket Intel Xeon Gold 6140 server with 18 cores and 192 GiB memory per socket. Base frequency for the processor is 3.2 GHz. Using Linux v5.4.0 with the kernel running Ubuntu18.04 guest OS. They extended the same kernel to test KSM vs nuKSM. Both of these operate at the same scan rate for pages (1K pages every 100ms). They executed VM on specific nodes VM-0 running on node-0 and executed instance-0 of the applications, VM1 would run on node-1 and executes instance-1. They then would test specific workloads and logged memory intensive micro-benchmarks that are specifically sensitive to NUMA.

Strengths and Weaknesses:

Strengths:

Weaknesses:

Class Discussion

Conclusion

Overall, nuKSM demonstrates that memory de-duplication and NUMA management cannot be treated as independent systems on modern multi-socket servers. While KSM is effective at reducing memory usage, its NUMA-unaware design leads to unfair performance variability and priority subversion. nuKSM addresses these issues by making de-duplication decisions based on access frequency, priority, and scalability, resulting in more balanced performance without sacrificing memory savings. Although real-world adoption may be limited due to implementation complexity and changing hardware trends, the paper highlights an important systems lesson that optimizing one kernel subsystem in isolation can create significant side effects elsewhere, and that NUMA-awareness is critical for predictable performance on modern architectures.

References

[1] A. Panda, A. Panwar, and A. Basu, “nuKSM: NUMA-aware Memory De-duplication on Multi-socket Servers,” Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), Oct. 2021, pp. 258–269, doi: 10.1109/PACT52795.2021.00026.

AI Disclosure