The ECE 4/599 Course Blog

FMI: Fast and Cheap Message Passing for Serverless Functions

by Sami Aljabery (Scribe), Gabriel Rodgers (Blogger), Noah Bean (Leader)

Introduction

This blog post covers FMI: Fast and Cheap Message Passing for Serverless Functions, a research paper submitted on May 15, 2023, and presented on January 22, 2025. The paper introduces the FaaS Message Interface (FMI), a high-performance communication framework for serverless computing. Traditional serverless architectures rely on storage-based communication (AWS S3, Redis, DynamoDB), introducing significant latency and cost overhead. FMI overcomes these challenges using direct TCP communication enabled through TCP NAT hole punching (TCPunch), reducing latency, cost, and complexity.


Background and Context

Function-as-a-Service (FaaS) is a widely used serverless cloud computing model, offering elastic scaling and fine-grained billing, making it ideal for machine learning, data analytics, and distributed applications. However, serverless functions lack efficient, low-latency communication mechanisms and often depend on cloud storage-based solutions such as AWS S3, Redis, and DynamoDB. These solutions increase latency and cost, making frequent inter-function communication inefficient.

Additionally, serverless functions operate behind Network Address Translation (NAT) gateways, preventing direct connections between functions. This introduces overhead and complexity, requiring functions to relay messages through cloud-based storage which further increases latency.

To solve these issues, FMI introduces TCP NAT hole punching (TCPunch) to establish direct, low-latency connections between functions.


KEYWORDS


Detailed Summary of the Paper

Summary of 1. Introduction

The paper introduces the FaaS Message Interface (FMI), a modular and high-performance communication framework designed to address the inefficiencies of serverless communication. Inspired by the Message Passing Interface (MPI), FMI brings standardized abstractions for point-to-point and collective communication to Function-as-a-Service (FaaS) platforms.

Key contributions include:

Summary of 2. Design of FMI

FMI’s design includes multiple communication channels, each tailored for specific use cases and trade-offs:

Key features of FMI’s design:

Summary of 3. Implementation of FMI

The FMI framework is lightweight (~1,900 lines of C++ code) and provides:

Summary of 4. Evaluation

FMI is evaluated for latency, bandwidth, cost, and scalability:

Important Results

  1. Reduction in Communication Latency

    • Direct TCP achieves microsecond-level latency, up to 162x faster than storage-based methods.
  2. Cost Savings

    • Up to 397x reduction in communication costs. Some ML workloads see costs under $0.02 per 1,000 epochs, versus $7.52 with DynamoDB.
  3. Improved Scalability

    • Efficient scaling to 256 serverless functions while maintaining low latency and high bandwidth.
  4. Bandwidth Performance

    • Superior bandwidth performance across message sizes, stable under high concurrency.
  5. Optimized Collective Operations

    • Implements broadcast, reduce, and allreduce with the lowest latency across all evaluated solutions.
  6. Case Study in Distributed Machine Learning

    • Replacing DynamoDB with FMI yields a 1224x improvement in communication speed, with no significant integration overhead.
  7. Minimal Integration Overhead

    • Only four lines of code were changed to replace DynamoDB with FMI in the machine learning example.

Strengths and Weaknesses of the Paper

Strengths

  1. Innovative Solution

    • Introduces a novel approach (TCP NAT hole punching) to solve communication bottlenecks in serverless computing.
  2. Comprehensive Evaluation

    • Benchmarks compare FMI against AWS S3, DynamoDB, and Redis for latency, cost, bandwidth, and scalability.
  3. Scalability

    • Maintains performance up to 256 serverless functions without significant degradation.
  4. Low Cost and High Performance

    • Achieves up to 397x cost savings and 162x faster communication.
  5. Portability and Modularity

    • Cloud-agnostic design compatible with AWS Lambda, Kubernetes, and MPI.
  6. Ease of Integration

    • Minimal code changes required, facilitating adoption in existing systems.

Weaknesses

  1. Reliance on Assumptions

    • Assumes all functions in a communication group are co-located and run concurrently, which may not always hold in practice.
  2. Limited Fault Tolerance

    • Lacks built-in mechanisms for handling individual function failures mid-communication.
  3. Dependency on External Infrastructure

    • Requires a hole-punching server for address translation.
  4. Limited Real-World Testing

    • Evaluated mainly in controlled benchmarks and case studies with broader real-world validations needed.

Class Discussion

Clarification on Table 1

UDP vs TCP?

Open Source Cloud?

Clarification on Figures

OpenAI (and other cloud providers’) Use of FMI?

RMA Discussion?


Sources


Generative AI

AI Tools Used

These tools aided in:

Example Contribution

Limitations


Did you find this post insightful? Share your thoughts below!