The ECE 4/599 Course Blog

Exploring Disaggregated Memory Performance with Firecracker VMM

by Gabriel Rodgers, Sami Aljabery, Rabecka Moffit

Introduction

This project aims to test and compare aggregated and disaggregated memory performance when using Firecracker VMM to run virtual machines. This topic was of interest to the group because as programs get larger and memory becomes the bottleneck, disaggregated memory will become more common. Firecracker is a popular tool at the moment because of its ability to rapidly deploy Micro Virtual Machines due to it beinng Virutal Machine Monitor (VMM). Our research sought to combine the fields of disaggregated memory and Virtual Machines (VM). However, due to time and skill constraints, the project became more limited in scope. The new aim of the project was to compare the memory performance of benchmarks in a local machine versus in a Firecracker VMM to facilitate a discussion between the benefits and drawbacks of virtual machines. We expected that the local machine would have better memory performance than Firecracker because there are more overheads. We will also provide a proposal for how to attempt to implement disaggregated memory in Firecracker for exploring this topic more.

Implementation

The implementation for this project started with the standard version of Firecracker, which will be the base case for this project. Firecracker was built onto a local machine using the file provided on GitHub here. The local machine used to run the Firecracker VMM is a Lenovo LOQ 15APH8 laptop that contains:

The common setup of Firecracker requires Docker and Bash. We started with a basic config.json provided on the Firecracker GitHub. Due to technical issues with KVM access, we used a configured file alongside a given bin (hello-vmlinux.bin) and rootfs (hello-rootfs.ext4). These two files represent the kernel image and the root filesystem image, both essential for launching Firecracker.

In the my_vm_config.json file, these files were configured. Other configurations specified:

To launch Firecracker, we used the following command:

sudo firecracker --no-seccomp --config-file my_vm_config.json  

Once running, we gathered system specifications:

The next step was to mount the STREAM benchmark executable. The VM retrieves data from a virtual disk, which serves as an init file containing necessary information to start a VM. We decided to use the benchmark STREAM because it is a simple benchmark that measures sustainable main memory bandwidth in MB/s and the corresponding computation rate for simple vector kernels. Once the VM config file structure contained the compiled file, the benchmark was able to be run within the virtual machine and test the memory bandwidth. It’s important to note that for the benchmark to work on the firecracker VMM, it needed to be statically compiled. For better results, we also ran the benchmark on the local machine that ran Firecracker. The benchmark program was obtained here.

Results

Using the STREAM benchmark, we ran 5 trials on each system to understand performance trends. The benchmark evaluates memory bandwidth and computation across four functions:

These assess data transfer performance, computational efficiency, parallel memory workloads, and overall memory performance. Figure 1 goes over “Best rate (MB/s)”, Figure 2 goes over “Average time (ms)”, Figure 3 goes over “minimum time (ms)”, and Figure 4 goes over “maximum time (ms)”. In the trials, we found that the local machine performed better than the firecracker VMM. In the figures below, we see that the local machine takes less time and has a higher rate of MB/s.

Figures:

Figure1
Figure 1: STREAM’s evaluation of Best Rate (MB/s) on both Firecracker and Local Machine.

Figure2
Figure 2: STREAM’s evaluation of Average Time (ms) on both Firecracker and Local Machine.

Figure3 Figure 3: STREAM’s evaluation of Minimum Time (ms) on both Firecracker and Local Machine.

Figure4 Figure 4: STREAM’s evaluation of Maximum Time (ms) on both Firecracker and Local Machine.

Taking the average of all four functions of STREAM, we found:

Issues

We had many challenges over the course of this project which started with finding a machine to run linux. Originally our team wanted to use an Oregon State University (OSU) owned machine but there were many regulations with this machine. All programs or downloads had to be added through OSU’s Information Technology (IT) department so this had week turn around times between a program request and the implementation. For security reasons, OSU IT would not download Docker which made Firecracker unusable. At this point, our team had to pivot from using an OSU owned machine to a personal machine that has a Linux Operating System (OS) because this is the only allowed OS for Firecracker.

Our next challenge was to get a custom file structure for the VM setup with the compiled benchmark files using Docker. We did not need to use Docker even though the documentation says that Docker is required. This was an issue we were able to overcome.

Implementing disaggregated memory was difficult as it required creating virtual nodes that would simulate the pooled memory. With the switch to a personal computer, the process became difficult to set up, and unfortunately left out of the testing. Another issue was that Firecracker is written in Rust but none of the project members have any prior experience with Rust. An incorrect assumption our team made was that Rust would be an approachable language that we would have the time and resources to learn. This was not the case and posed a more significant challenge than we had anticipated.

Next Steps

Our proposal for implementing disaggregated memory support in Firecracker involves modifying several memory-related files in the Firecracker VMM kernel. These files are shown in the following directory map of Firecracker, where three asterisked-files are those that should be modified to support disaggregated memory:

firecracker
├── CHANGELOG.md
├── CHARTER.md
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── CREDITS.md
├── Cargo.lock
├── Cargo.toml
├── DEPRECATED.md
├── FAQ.md
├── LICENSE
├── MAINTAINERS.md
├── NOTICE
├── PGP-KEY.asc
├── README.md
├── SECURITY.md
├── SPECIFICATION.md
├── THIRD-PARTY
├── deny.toml
├── docs
├── pre-commit
├── resources
│   ├── chroot.sh
│   ├── guest_configs
│   ├── overlay
│   ├── rebuild.sh
│   └── seccomp
│       ├── ***aarch64-unknown-linux-musl.json***
│       ├── unimplemented.json
│       └── ***x86_64-unknown-linux-musl.json***
├── rusty-hook.toml
├── src
│   ├── acpi-tables
│   ├── clippy-tracing
│   ├── cpu-template-helper
│   ├── firecracker
│   ├── jailer
│   ├── log-instrument
│   ├── log-instrument-macros
│   ├── rebase-snap
│   ├── seccompiler
│   ├── snapshot-editor
│   ├── utils
│   └── vmm
│       ├── benches
│       ├── Cargo.toml
│       ├── src
│       │   ├── acpi
│       │   ├── arch
│       │   │   ├── aarch64
│       │   │   │   ├── cache_info.rs
│       │   │   │   ├── fdt.rs
│       │   │   │   ├── gic
│       │   │   │   ├── ***layout.rs***
│       │   │   │   ├── mod.rs
│       │   │   │   ├── output_GICv2.dtb
│       │   │   │   ├── output_GICv3.dtb
│       │   │   │   ├── output_initrd_GICv2.dtb
│       │   │   │   ├── output_initrd_GICv3.dtb
│       │   │   │   ├── regs.rs
│       │   │   │   └── vcpu.rs
│       │   │   ├── mod.rs
│       │   │   └── x86_64
│       │   │       ├── cpu_model.rs
│       │   │       ├── gdt.rs
│       │   │       ├── gen
│       │   │       ├── interrupts.rs
│       │   │       ├── ***layout.rs***
│       │   │       ├── mod.rs
│       │   │       ├── mptable.rs
│       │   │       ├── msr.rs
│       │   │       └── regs.rs
│       │   ├── builder.rs
│       │   ├── cpu_config
│       │   ├── device_manager
│       │   ├── devices
│       │   ├── dumbo
│       │   ├── gdb
│       │   ├── io_uring
│       │   ├── lib.rs
│       │   ├── logger
│       │   ├── mmds
│       │   ├── persist.rs
│       │   ├── rate_limiter
│       │   ├── resources.rs
│       │   ├── rpc_interface.rs
│       │   ├── seccomp.rs
│       │   ├── signal_handler.rs
│       │   ├── snapshot
│       │   ├── test_utils
│       │   ├── utils
│       │   ├── vmm_config
│       │   └── vstate
│       └── tests
├── tests
└── tools

Purposes of each file and why they would need to be modified to support disaggregated memory:

The above files are what we would change to support disaggregated memory in Firecracker. For RDMA, the number of files needed to be modified would likely be larger than the subset that we highlight due to the fact that memory accesses via RDMA would need completely different interfaces than the regular POSIX-compliant syscalls used for local memory accesses. For CXL, the number of files needed to be modified would also likely be larger than the subset we highlight due to needing to use different names for CXL memory rather than local memory to prevent performance hits. It would be beneficial in that case to have distinct labels for local memory and CXL memory, where CXL memory would be used when local memory is low. This would require changes in files that use memory as well as the layout.rs files that we highlighted. The layout and syscall json files that we mention are likely just a starting point to get disaggregated memory supported in Firecracker.

Once the above files, and possibly others, are modified we would use firecracker to launch a virtual machine that supported disaggregated memory. Since we did not have an available system that supports disaggregated memory, we had planned to utilize server space through CloudLab. One server would support the Firecracker kernel with some memory to approximate a computer and another server would contain entirely memory to approximate a remote memory bank. The test would involve running memory benchmarks with the programs entirely contained in the first server versus with programs having spread their memory between the two servers. Ultimately, tests where entire programs won’t fit in the memory available on the first server would give a better idea of the performance of disaggregated memory because the program has to use this structure rather than arbitrarily using this structure. The benchmarks that we had picked were Parsec, NAS Parallel Benchmark, Graph500, Intel Memory Latency Checker, and STREAM.


Conclusion

This project went through several iterations before completion. The goal of the first version of this project was to modify the Firecracker VMM source code to provide disaggregated memory support and use performance benchmarks to test this version on CloudLab machines - comparing local aggregated memory performance to disaggregated memory performance. This version was far too ambitious for a single term, so the project’s scope was decreased. This second version of the project aimed to test performance benchmarks running in Firecracker VMM and natively and to provide a short proposal on how to support disaggregated memory in Firecracker VMM. We used the STREAM benchmark to test performance differences between these two options because of the relative ease of compilation of STREAM, and we successfully showed that Firecracker has a small hit (around 10%) to performance when compared to native performance. Our proposal states that for disaggregated memory support, the layout.rs files would need to be modified for CXL (and locally-pinned RDMA memory) disaggregated memory support. It also requires that the syscall json files would need to be modified to support RDMA syscalls. Lastly, our proposal states that more files would need to be changed to support switching between disaggregated memory and local memory for optimal performance.

Key Findings:


References