SSD Offload#
Overview#
Mooncake Store supports offloading KV cache objects from distributed memory to local SSD. When memory pressure is high, the master instructs clients to persist selected objects to disk. On a cache miss, the client automatically falls back to reading from SSD.
SSD offload requires the Real Client and supports two deployment modes:
Mode A: Embedded Real Client — the Python process embeds the Real Client, and SSD offload runs inside the Python process.
Mode B: Standalone Real Client + DummyClient — a standalone
mooncake_clientprocess runs SSD offload, and the Python process connects via a DummyClient.
In both modes, all SSD reads and writes happen within the Real Client (embedded or standalone).
Startup Steps#
Step 1: Create the SSD storage directory#
mkdir -p /nvme/mooncake_offload
Step 2: Start the master#
mooncake_master \
--rpc_port=50051 \
--enable_offload=true
Step 3A (Mode A): Start the application with embedded Real Client#
Use the --enable_ssd_offload flag to enable SSD offload, and set environment variables to specify the storage path and backend:
from mooncake.store import MooncakeDistributedStore
store = MooncakeDistributedStore()
store.setup(
local_hostname="<machine IP>",
metadata_server="P2PHANDSHAKE",
global_segment_size=4 * 1024 * 1024 * 1024, # 4 GB
local_buffer_size=512 * 1024 * 1024, # 512 MB
protocol="rdma",
rdma_devices="eth0",
master_server_addr="127.0.0.1:50051",
enable_ssd_offload=True,
ssd_offload_path="/nvme/mooncake_offload"
)
Step 3B (Mode B): Start the standalone real client with SSD offload enabled and connect it with the dummy client#
Set the same SSD offload environment variables as in Mode A.
export MOONCAKE_OFFLOAD_FILE_STORAGE_PATH=/nvme/mooncake_offload
export MOONCAKE_OFFLOAD_STORAGE_BACKEND_DESCRIPTOR=bucket_storage_backend
mooncake_client \
--master_server_address=127.0.0.1:50051 \
--host=<machine IP> \
--protocol="rdma" \
--device_names=<NIC name, e.g. eth0> \
--port=50052 \
--global_segment_size="4 GB" \
--enable_offload=true \
--metadata_server="P2PHANDSHAKE"
The application (e.g., SGLang) connects to the standalone real client via the MooncakeDistributedStore Python SDK. The Python process acts as a DummyClient; SSD offload and fallback loading are handled by the standalone real client process.
from mooncake.store import MooncakeDistributedStore
store = MooncakeDistributedStore()
store.setup_dummy(
mem_pool_size=4 * 1024 * 1024 * 1024, # 4 GB
local_buffer_size=512 * 1024 * 1024, # 512 MB
server_address="<machine IP>:50052" # mooncake_client RPC address from Step 3
)
Note: When using
bucket_storage_backendorfile_per_key_storage_backend, the real client scans existing SSD metadata on startup and reports it to the master automatically.offset_allocator_storage_backendis the exception: it truncates its data file during initialization and does not recover previously offloaded objects after a restart.
Real Client Parameters#
Flag |
Default |
Description |
|---|---|---|
|
|
Metadata server connection string |
|
|
Master address |
|
|
This machine’s externally reachable IP |
|
|
Real client RPC listening port |
|
|
NIC name(s), e.g. |
|
|
Transport protocol: |
|
|
Memory pool size allocated for this node |
|
|
Must be set to |
|
|
Number of RPC server threads |
SSD Offload Configuration#
Core settings#
Environment Variable |
Default |
Description |
|---|---|---|
|
|
Absolute path to the SSD storage directory |
|
|
Storage backend type (see below) |
|
|
Client-side staging buffer size |
|
|
Max keys processed per iteration when scanning existing SSD metadata on startup |
|
|
Maximum disk usage |
|
|
Maximum number of objects on disk |
|
|
Interval for offload heartbeat to master (seconds) |
|
|
Interval for reclaiming expired offload buffers; defaults to the heartbeat interval in the current implementation |
|
|
Lease time for buffers returned by |
|
|
Enable io_uring for async file I/O |
Bucket backend settings#
Applies when MOONCAKE_OFFLOAD_STORAGE_BACKEND_DESCRIPTOR=bucket_storage_backend.
Environment Variable |
Default |
Description |
|---|---|---|
|
|
Max size per bucket |
|
|
Max keys per bucket |
|
|
Eviction threshold in bytes. When set to |
|
|
Eviction policy: |
File-per-key backend settings#
Applies when MOONCAKE_OFFLOAD_STORAGE_BACKEND_DESCRIPTOR=file_per_key_storage_backend.
Environment Variable |
Default |
Description |
|---|---|---|
|
|
Subdirectory name created under |
|
|
Enables local-storage eviction logic for this backend |
Storage Backends#
bucket_storage_backend (recommended)#
Groups multiple objects into bucket files. Reduces filesystem overhead, supports efficient batch I/O, and supports FIFO and LRU eviction.
File layout:
/nvme/mooncake_offload/
├── 1710000000000-0.bucket # data file (multiple KV pairs)
├── 1710000000000-0.meta # metadata file
├── 1710000000001-0.bucket
└── ...
Best for: general-purpose use, large-scale deployments.
file_per_key_storage_backend#
Stores each object in an individual file. Simple and easy to inspect, but generates many small files at scale.
Environment Variable |
Default |
Description |
|---|---|---|
|
|
Subdirectory name under |
|
|
Enable disk eviction when the total size exceeds the quota |
Best for: debugging or small-scale deployments.
offset_allocator_storage_backend#
Pre-allocates a single large file and manages offset-based allocation within it. Highest concurrency via 1024-shard metadata.
Warning: This backend does not support metadata recovery on restart. On initialization, the data file is truncated and all in-memory metadata is cleared. Any previously offloaded objects become inaccessible after a process restart.
Capacity: MOONCAKE_OFFLOAD_TOTAL_SIZE_LIMIT_BYTES is used directly as the pre-allocated file size (100%, no safety margin). Unlike bucket_storage_backend, there is no separate quota variable — this is the sole disk usage control. Set it below the physical disk capacity to avoid filling the disk; writes are rejected once usage reaches this limit.
Best for: high-concurrency scenarios with many small objects where restart durability is not required.
Eviction (Bucket Backend Only)#
When MOONCAKE_OFFLOAD_BUCKET_MAX_TOTAL_SIZE is set, the backend automatically evicts buckets before writing new ones if total disk usage would exceed the limit.
Policy |
Behavior |
|---|---|
|
No eviction (default); writes fail when disk is full |
|
Evict the oldest bucket first |
|
Evict the least recently read bucket first |
Eviction is two-phase: the bucket is removed from metadata and master is notified first, then in-flight reads are drained before files are deleted.
Example#
The following example starts a master and a real client on a single machine.
Environment#
Machine IP:
192.168.1.10NIC:
eth0SSD mount point:
/nvmeMemory pool size: 4 GB (smaller than the total data written, to trigger offload)
Start the master#
mooncake_master \
--rpc_port=50051 \
--enable_offload=true
Start the real client (new terminal)#
export MOONCAKE_OFFLOAD_FILE_STORAGE_PATH=/nvme/mooncake_offload
export MOONCAKE_OFFLOAD_STORAGE_BACKEND_DESCRIPTOR=bucket_storage_backend
export MOONCAKE_OFFLOAD_BUCKET_MAX_TOTAL_SIZE=$((200 * 1024 * 1024 * 1024)) # 200 GB
export MOONCAKE_OFFLOAD_BUCKET_EVICTION_POLICY=lru
mooncake_client \
--master_server_address="192.168.1.10:50051" \
--host="192.168.1.10" \
--device_names="eth0" \
--port=50052 \
--protocol="rdma" \
--global_segment_size="4GB" \
--enable_offload="true"
Notes#
MOONCAKE_OFFLOAD_FILE_STORAGE_PATHmust be an absolute path to an existing, writable directory. Symbolic links and paths containing..are rejected.On real client restart,
bucket_storage_backendandfile_per_key_storage_backendscan existing SSD metadata and report it to the master, so previously offloaded objects remain accessible.offset_allocator_storage_backenddoes not support restart recovery.Eviction only notifies the master and deletes local files; objects replicated on other nodes are unaffected.
Each machine requires its own real client process. In multi-node deployments, ensure
--hostand--portare correctly set so nodes can reach each other.
2-node example: suppose Node A (192.168.1.10) runs the master and Node B (192.168.1.11) is a second worker. Both real clients must point to the same master and advertise their own externally reachable IP:
# Node A — runs the master and its own real client
mooncake_master --rpc_port=50051 --enable_offload=true &
export MOONCAKE_OFFLOAD_FILE_STORAGE_PATH=/nvme/mooncake_offload
mooncake_client \
--master_server_address="192.168.1.10:50051" \
--host="192.168.1.10" \ # externally reachable IP of Node A
--device_names="eth0" \
--protocol="rdma" \
--metadata_server="P2PHANDSHAKE" \
--port=50052 \
--global_segment_size="4GB" \
--enable_offload="true"
# Node B — real client only; points to the same master on Node A
export MOONCAKE_OFFLOAD_FILE_STORAGE_PATH=/nvme/mooncake_offload
mooncake_client \
--master_server_address="192.168.1.10:50051" \
--host="192.168.1.11" \ # externally reachable IP of Node B, NOT 127.0.0.1
--device_names="eth0" \
--protocol="rdma" \
--metadata_server="P2PHANDSHAKE" \
--port=50052 \
--global_segment_size="4GB" \
--enable_offload="true"
Troubleshooting#
SSD offload is not triggering#
Confirm
--enable_offload=trueis passed to bothmooncake_clientandmooncake_master. Both binaries define the same gflags boolean; hyphenated aliases may work, but the underscored spelling matches the source.Check that
MOONCAKE_OFFLOAD_FILE_STORAGE_PATHpoints to an existing, writable directory. If the path is invalid, real client setup fails duringFileStorageConfig::Validate().Verify memory pressure is actually high enough for the master to trigger offload. If the memory pool (
--global_segment_size) is large relative to the data written, offload may never activate.
“Permission denied” or “No such file or directory” on the storage path#
Ensure the directory exists before starting the client:
mkdir -p <path>.Confirm the process user has read/write access to the directory.
Symbolic links and paths containing
..are rejected — use an absolute, canonical path.The path must already exist and must be a directory. Passing a regular file path also causes setup to fail.
“Failed to register buffer with UringFile” warning in logs#
This warning appears when MOONCAKE_OFFLOAD_USE_URING=true and the io_uring fixed-buffer registration fails. The most common cause is that MOONCAKE_OFFLOAD_LOCAL_BUFFER_SIZE_BYTES exceeds the process’s locked-memory limit (RLIMIT_MEMLOCK). io_uring requires the registered buffer to be pinned in physical memory, which counts against this limit.
Check the current limit:
ulimit -l # in KB; "unlimited" means no cap
To raise it for the current session:
ulimit -l unlimited
To raise it permanently, add the following to /etc/security/limits.conf:
* soft memlock unlimited
* hard memlock unlimited
Alternatively, reduce MOONCAKE_OFFLOAD_LOCAL_BUFFER_SIZE_BYTES to a value within the existing limit. Note that the warning does not abort startup — the client falls back to non-fixed-buffer I/O — but performance may be lower than expected.