Build Guide#
This document describes how to build Mooncake.
Build From Source#
Recommended Version#
OS: Ubuntu 22.04 LTS+
cmake: 3.20.x
gcc: 9.4+
Default Build#
Install common build dependencies first. A stable Internet connection is required because the script installs system packages, initializes submodules, installs Go, and builds/installs yalantinglibs.
sudo bash dependencies.sh
Then build and install Mooncake:
mkdir build
cd build
cmake ..
make -j
sudo make install
Build with NVMe-oF SSD Pool#
To enable the NVMe-oF SSD pool, install the SPDK dependencies and build
Mooncake with USE_NOF enabled:
sudo bash dependencies.sh --with-spdk
mkdir build
cd build
cmake .. -DUSE_NOF=ON
make -j
sudo make install
-DUSE_NOF=ON builds the NoF registration APIs and deployment tools. Use
-DUSE_NOF=OFF or omit the option when the NVMe-oF SSD pool is not needed.
Hardware Backend Setup#
Run sudo bash dependencies.sh before using any of these backend-specific build
options. The script handles common dependencies; vendor SDKs and runtime
environment setup must be prepared separately.
Hardware / backend |
Build option |
External SDK / setup |
Environment and notes |
|---|---|---|---|
NVIDIA CUDA / GPUDirect |
|
Install CUDA 12.1+ and enable |
Add CUDA libraries to |
NVIDIA Multi-Node NVLink |
|
Requires CUDA. |
Also set |
Moore Threads MUSA |
|
Install MUSA SDK and |
Add |
Cambricon MLU |
|
Install Cambricon Neuware SDK. |
Set |
MetaX MACA |
|
Install MACA SDK. |
Set |
Huawei Ascend Direct |
|
Install Ascend CANN Toolkit and ADXL dependencies. |
Source |
Huawei Ascend UBSHMEM |
|
Install Ascend CANN Toolkit. Requires CANN >= 9.0.0, driver >= 26.0.0, Lingqu >= 1.5. |
Source the CANN |
AMD HIP / ROCm |
|
Install ROCm/HIP SDK. |
Ensure HIP compiler, headers, and runtime libraries are visible to CMake. |
Hygon DCU |
|
Install DTK SDK. |
Set |
Iluvatar CoreX |
|
Install CoreX SDK. |
Set |
GPU-Direct RDMA
Mooncake can use the DMA-BUF path for GPU-Direct RDMA, which does not
require the nvidia-peermem kernel module. Set WITH_NVIDIA_PEERMEM=0 before
starting Mooncake to use DMA-BUF. Set WITH_NVIDIA_PEERMEM=1 to use the legacy
ibv_reg_mr path, which requires nvidia-peermem. See Section 3.7 of
https://docs.nvidia.com/cuda/gpudirect-rdma/ for nvidia-peermem installation
instructions.
Use Mooncake in Docker Containers#
Mooncake supports Docker-based deployment. You can either build the image from
this repository with docker/mooncake.Dockerfile or substitute a published
tag that matches the release you want to run.
For the container to use the host’s network resources, you need to add the --device option when starting the container. The following is an example.
# In host
sudo docker build -f docker/mooncake.Dockerfile -t mooncake:from-source .
sudo docker run --net=host --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm --ulimit memlock=-1 -t -i mooncake:from-source /bin/bash
# Run transfer engine in container
cd /Mooncake-main/build/mooncake-transfer-engine/example
./transfer_engine_bench --device_name=ibp6s0 --metadata_server=10.1.101.3:2379 --mode=target --local_server_name=10.1.100.3
For SGLang HiCache deployments inside Docker, reserve HugeTLB pages on the host before starting the container and pass the allocator settings through the container environment:
python3 scripts/check_hicache_hugepage_requirements.py \
--tp-size 4 \
--hicache-size 64gb \
--global-segment-size 8gb \
--arena-pool-size 56gb \
--available-hugetlb 512gb
sudo sysctl -w vm.nr_hugepages=262144
grep -E 'HugePages_Total|HugePages_Free|Hugepagesize' /proc/meminfo
sudo docker run --gpus all \
--net=host \
--ipc=host \
--ulimit memlock=-1 \
--shm-size=128g \
--device=/dev/infiniband/uverbs0 \
--device=/dev/infiniband/rdma_cm \
-e MC_STORE_USE_HUGEPAGE=1 \
-e MC_STORE_HUGEPAGE_SIZE=2MB \
-e MOONCAKE_GLOBAL_SEGMENT_SIZE=8gb \
-e MC_MMAP_ARENA_POOL_SIZE=56gb \
-t -i mooncake:from-source /bin/bash
The 64gb / 56gb values above are tuned examples for large HiCache deployments, not defaults. The arena remains disabled unless you explicitly enable it, and if you enable it via gflag without an env override the default pool size is 8gb. On smaller hosts, start with 8gb or 16gb and size upward with the helper. When you want the baseline direct-mmap() path instead of the arena, set MC_DISABLE_MMAP_ARENA=1 (also accepts true, yes, or on) and omit MC_MMAP_ARENA_POOL_SIZE. Set it before the first Mooncake mmap-buffer allocation in the process. If you build the image from source with docker/mooncake.Dockerfile, that source-built image also installs the helper as mooncake-hicache-sizing.
Without MC_STORE_USE_HUGEPAGE=1, the arena may opportunistically try hugepages and then retry on regular pages if HugeTLB is unavailable. When MC_STORE_USE_HUGEPAGE=1 is set, both the arena path and the direct-mmap() fallback path require HugeTLB pages. Mooncake will not silently degrade that explicit hugepage request to regular pages.
Advanced Compile Options#
The following options can be passed to cmake ...
Accelerator and Hardware Options#
Option |
Default |
Description |
|---|---|---|
|
|
Enable GPU memory support, including GPUDirect RDMA, NVMe-oF, and GPU-aware TCP transport. Required when transferring GPU memory, even when using TCP. |
|
|
Enable Multi-Node NVLink transport. Requires |
|
|
Enable Moore Threads GPU support via MUSA. |
|
|
Enable MetaX (Muxi) GPU support via MACA. |
|
|
Enable AMD GPU support via HIP/ROCm. |
|
|
Enable Hygon DCU support via DTK SDK. Uses a CUDA-compatible runtime. |
|
|
Enable Iluvatar CoreX GPU support. Uses a CUDA-compatible runtime. |
|
|
Enable Cambricon MLU memory support via Neuware, including memory detection, topology discovery, and RDMA registration. |
|
|
Enable Ascend Direct transport and HCCS support via the ADXL engine. Recommended for Ascend builds. |
|
|
Enable Huawei Ascend NPU shared memory transport via CANN VMM APIs. |
|
|
Enable intranode NVLink transport. |
|
|
Enable CXL support. |
Vendor SDK Path Overrides#
Option |
Applies when |
Description |
|---|---|---|
|
|
Override the MACA SDK root. |
|
|
Override the MACA include directory. |
|
|
Override the MACA library directory. |
|
|
Override MACA runtime libraries linked by |
|
|
Override the DTK SDK root. |
|
|
Override the DTK include directory. |
|
|
Override the DTK library directory. |
|
|
Override the CoreX SDK root. |
|
|
Override the CoreX include directory. |
|
|
Override the CoreX library directory. |
|
|
Override the Neuware SDK root. |
|
|
Override the Neuware include directory. |
|
|
Override the Neuware library directory. |
Transport and Metadata Options#
Option |
Default |
Description |
|---|---|---|
|
|
Enable AWS Elastic Fabric Adapter transport via libfabric. See EFA Transport. |
|
|
Build Mooncake Store with NVMe-oF SSD pool support. Use |
|
|
Enable Redis-based metadata service for Transfer Engine. Requires hiredis. |
|
|
Enable HTTP-based metadata service. |
|
|
Enable etcd-based metadata service. Requires Go 1.23+. |
|
|
Enable etcd-based failover for Mooncake Store. Independent from |
|
|
Enable Redis-based failover for Mooncake Store. Independent from |
|
|
Enable Kubernetes Lease-based failover for Mooncake Store. Cannot be enabled with |
Component Options#
Option |
Default |
Description |
|---|---|---|
|
|
Build the Mooncake Transfer Engine component and sample code. |
|
|
Build the Mooncake Store component. |
|
|
Build Go bindings for Mooncake Store when |
|
|
Enable Golang support and build the P2P Store component. Requires Go 1.23+. |
|
|
Build the Transfer Engine Rust interface and sample code. |
|
|
Build Mooncake Store Rust bindings and CMake Rust targets. |
|
|
Build the EP and PG Python extensions for CUDA. Requires CUDA toolkit and PyTorch. Use |
Build Behavior Options#
Option |
Default |
Description |
|---|---|---|
|
|
Build Transfer Engine as a shared library. |
|
|
Build unit tests. |
|
|
Build examples. |
|
|
Use jemalloc in the Mooncake Store master. |