Build Guide#
This document describes how to build Mooncake.
PyPI Package#
Install the Mooncake Transfer Engine package from PyPI, which includes both Mooncake Transfer Engine and Mooncake Store Python bindings:
For CUDA-enabled systems:
pip install mooncake-transfer-engine
📦 Package Details: https://pypi.org/project/mooncake-transfer-engine/
For non-CUDA systems:
pip install mooncake-transfer-engine-non-cuda
📦 Package Details: https://pypi.org/project/mooncake-transfer-engine-non-cuda/
Note: The CUDA version includes Mooncake-EP and GPU topology detection, requiring CUDA 12.1+. The non-CUDA version is for environments without CUDA dependencies. Note: MLU support is currently source-build only. If you need Cambricon MLU memory support, install Neuware and build with
-DUSE_MLU=ON.
Automatic#
Recommended Version#
OS: Ubuntu 22.04 LTS+
cmake: 3.20.x
gcc: 9.4+
Steps#
Install dependencies, stable Internet connection is required:
bash dependencies.shIn the root directory of this project, run the following commands:
mkdir build cd build cmake .. make -j
Install Mooncake python package and mooncake_master executable
sudo make install
Manual#
Recommended Version#
cmake: 3.22.x
boost-devel: 1.66.x
googletest: 1.12.x
gcc: 10.2.1
go: 1.22+
hiredis
curl
Steps#
Install dependencies from system software repository:
# For debian/ubuntu apt-get install -y build-essential \ cmake \ libibverbs-dev \ libgoogle-glog-dev \ libgtest-dev \ libjsoncpp-dev \ libnuma-dev \ libunwind-dev \ libpython3-dev \ libboost-all-dev \ libssl-dev \ pybind11-dev \ libcurl4-openssl-dev \ libhiredis-dev \ pkg-config \ patchelf # For centos/alibaba linux os yum install cmake \ gflags-devel \ glog-devel \ libibverbs-devel \ numactl-devel \ gtest \ gtest-devel \ boost-devel \ openssl-devel \ hiredis-devel \ libcurl-devel
NOTE: You may need to install gtest, glog, gflags from source code:
git clone https://github.com/gflags/gflags git clone https://github.com/google/glog git clone https://github.com/abseil/googletest.git
If you want to compile the GPUDirect support module, first follow the instructions in https://docs.nvidia.com/cuda/cuda-installation-guide-linux/ to install CUDA (ensure to enable
nvidia-fsfor propercuFilemodule compilation). After that:Follow Section 3.7 in https://docs.nvidia.com/cuda/gpudirect-rdma/ to install
nvidia-peermemfor enabling GPU-Direct RDMAConfigure
LIBRARY_PATHandLD_LIBRARY_PATHto ensure linking ofcuFile,cudart, and other libraries during compilation:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
If you want to compile the Moore Mthreads GPUDirect support module, first follow the instructions in https://docs.mthreads.com/musa-sdk/musa-sdk-doc-online/install_guide to install MUSA. After that:
Install
mthreads-peermemfor enabling GPU-Direct RDMAConfigure
LIBRARY_PATHandLD_LIBRARY_PATHto ensure linking ofmusart, and other libraries during compilation:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/musa/lib export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/musa/lib
If you want to compile Cambricon MLU support, first install the Cambricon Neuware SDK. After that:
Export
NEUWARE_HOMEor pass-DNEUWARE_ROOT=/path/to/neuwareto CMakeConfigure
LIBRARY_PATHandLD_LIBRARY_PATHto ensure linking ofcnrt,cndrv, and other Neuware libraries during compilation:
export NEUWARE_HOME=/usr/local/neuware export LIBRARY_PATH=$LIBRARY_PATH:${NEUWARE_HOME}/lib64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${NEUWARE_HOME}/lib64
If your Neuware installation lives outside the default include/library layout, you can also pass:
cmake .. -DUSE_MLU=ON \ -DMLU_INCLUDE_DIR=/path/to/neuware/include \ -DMLU_LIB_DIR=/path/to/neuware/lib64
For Cambricon MLU builds, enable the MLU backend explicitly:
cmake .. -DUSE_MLU=ON -DNEUWARE_ROOT=${NEUWARE_HOME:-/usr/local/neuware} make -j
If you want to compile MetaX (Muxi) MACA support (e.g. C500), install the MACA SDK so headers and libraries are available under
MACA_ROOT(defaults toMACA_HOMEenv var if set, otherwise/opt/maca). SDK layouts vary; include bothlibandlib64in runtime paths when needed:export MACA_HOME=/opt/maca export LIBRARY_PATH=$LIBRARY_PATH:${MACA_HOME}/lib:${MACA_HOME}/lib64 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${MACA_HOME}/lib:${MACA_HOME}/lib64
Build with
-DUSE_MACA=ON. Optional overrides:-DMACA_ROOT=/path/to/maca-DMACA_INCLUDE_DIR=/path/to/maca/include-DMACA_LIB_DIR=/path/to/maca/lib64-DMACA_RUNTIME_LIBS="mcruntime;mxc-runtime64;rt"(semicolon-separated CMake list)
Install yalantinglibs
git clone https://github.com/alibaba/yalantinglibs.git cd yalantinglibs mkdir build && cd build cmake .. -DBUILD_EXAMPLES=OFF -DBUILD_BENCHMARK=OFF -DBUILD_UNIT_TESTS=OFF make -j$(nproc) make install
In the root directory of this project, run the following commands:
mkdir build cd build cmake .. make -j
Install Mooncake python package and mooncake_master executable
make install
Use Mooncake in Docker Containers#
Mooncake supports Docker-based deployment. You can either build the image from
this repository with docker/mooncake.Dockerfile or substitute a published
tag that matches the release you want to run.
For the container to use the host’s network resources, you need to add the --device option when starting the container. The following is an example.
# In host
sudo docker build -f docker/mooncake.Dockerfile -t mooncake:from-source .
sudo docker run --net=host --device=/dev/infiniband/uverbs0 --device=/dev/infiniband/rdma_cm --ulimit memlock=-1 -t -i mooncake:from-source /bin/bash
# Run transfer engine in container
cd /Mooncake-main/build/mooncake-transfer-engine/example
./transfer_engine_bench --device_name=ibp6s0 --metadata_server=10.1.101.3:2379 --mode=target --local_server_name=10.1.100.3
For SGLang HiCache deployments inside Docker, reserve HugeTLB pages on the host before starting the container and pass the allocator settings through the container environment:
python3 scripts/check_hicache_hugepage_requirements.py \
--tp-size 4 \
--hicache-size 64gb \
--global-segment-size 8gb \
--arena-pool-size 56gb \
--available-hugetlb 512gb
sudo sysctl -w vm.nr_hugepages=262144
grep -E 'HugePages_Total|HugePages_Free|Hugepagesize' /proc/meminfo
sudo docker run --gpus all \
--net=host \
--ipc=host \
--ulimit memlock=-1 \
--shm-size=128g \
--device=/dev/infiniband/uverbs0 \
--device=/dev/infiniband/rdma_cm \
-e MC_STORE_USE_HUGEPAGE=1 \
-e MC_STORE_HUGEPAGE_SIZE=2MB \
-e MOONCAKE_GLOBAL_SEGMENT_SIZE=8gb \
-e MC_MMAP_ARENA_POOL_SIZE=56gb \
-t -i mooncake:from-source /bin/bash
The 64gb / 56gb values above are tuned examples for large HiCache deployments, not defaults. The arena remains disabled unless you explicitly enable it, and if you enable it via gflag without an env override the default pool size is 8gb. On smaller hosts, start with 8gb or 16gb and size upward with the helper. When you want the baseline direct-mmap() path instead of the arena, set MC_DISABLE_MMAP_ARENA=1 (also accepts true, yes, or on) and omit MC_MMAP_ARENA_POOL_SIZE. Set it before the first Mooncake mmap-buffer allocation in the process. If you build the image from source with docker/mooncake.Dockerfile, that source-built image also installs the helper as mooncake-hicache-sizing.
Without MC_STORE_USE_HUGEPAGE=1, the arena may opportunistically try hugepages and then retry on regular pages if HugeTLB is unavailable. When MC_STORE_USE_HUGEPAGE=1 is set, both the arena path and the direct-mmap() fallback path require HugeTLB pages. Mooncake will not silently degrade that explicit hugepage request to regular pages.
Advanced Compile Options#
The following options can be used during cmake .. to specify whether to compile certain components of Mooncake.
-DUSE_CUDA=[ON|OFF]: Enable GPU memory support (GPUDirect RDMA, NVMe-oF, and GPU-aware TCP transport). Default: OFF. Required when transferring GPU memory (e.g., KV cache in vLLM disaggregated serving), even when using TCP protocol.-DUSE_MNNVL=[ON|OFF]: Enable Multi-Node NVLink transport support, default is OFF. Note:-DUSE_CUDAis required when-DUSE_MNNVLis on (not used when building with-DUSE_MUSA=ON,-DUSE_HIP=ON, or-DUSE_MACA=ON).-DUSE_MUSA=[ON|OFF]: Enable Moore Threads GPU support via MUSA-DUSE_MACA=[ON|OFF]: Enable MetaX (Muxi) GPU support via MACA.-DMACA_ROOT=/path/to/maca: Override the MACA SDK root (MACA_HOMEenv var is also honored; default/opt/maca).-DMACA_INCLUDE_DIR=/path/to/include: Override MACA include directory when-DUSE_MACA=ON.-DMACA_LIB_DIR=/path/to/lib64: Override MACA library directory when-DUSE_MACA=ON.-DMACA_RUNTIME_LIBS="mcruntime;mxc-runtime64;rt": Override MACA runtime libraries linked bytransfer_engine.-DUSE_HIP=[ON|OFF]: Enable AMD GPU support via HIP/ROCm-DUSE_HYGON=[ON|OFF]: Enable Hygon DCU support via DTK SDK. Default: OFF. Uses CUDA-compatible runtime.-DDTK_ROOT=/path/to/dtk: Override the default DTK SDK root used when-DUSE_HYGON=ON. If unset, Mooncake usesDTK_HOMEor/opt/dtk.-DDTK_INCLUDE_DIR=/path/to/include: Override the DTK include directory when-DUSE_HYGON=ON.-DDTK_LIB_DIR=/path/to/lib64: Override the DTK library directory when-DUSE_HYGON=ON.-DUSE_COREX=[ON|OFF]: Enable Iluvatar CoreX GPU support. Default: OFF. Uses CUDA-compatible runtime.-DCOREX_ROOT=/path/to/corex: Override the default CoreX SDK root used when-DUSE_COREX=ON. If unset, Mooncake usesCOREX_HOMEor/usr/local/corex.-DCOREX_INCLUDE_DIR=/path/to/include: Override the CoreX include directory when-DUSE_COREX=ON.-DCOREX_LIB_DIR=/path/to/lib: Override the CoreX library directory when-DUSE_COREX=ON.-DUSE_MLU=[ON|OFF]: Enable Cambricon MLU memory support via Neuware. Default: OFF. Supports MLU memory detection, topology discovery, and RDMA registration for Transfer Engine.-DNEUWARE_ROOT=/path/to/neuware: Override the default Neuware SDK root used when-DUSE_MLU=ON. If unset, Mooncake usesNEUWARE_HOMEor/usr/local/neuware.-DMLU_INCLUDE_DIR=/path/to/include: Override the Neuware include directory when-DUSE_MLU=ON.-DMLU_LIB_DIR=/path/to/lib64: Override the Neuware library directory when-DUSE_MLU=ON.-DUSE_EFA=[ON|OFF]: Enable AWS Elastic Fabric Adapter transport via libfabric. Default: OFF. See EFA Transport for details.-DUSE_INTRA_NVLINK=[ON|OFF]: Enable intranode nvlink transport-DUSE_CXL=[ON|OFF]: Enable CXL support-DWITH_STORE=[ON|OFF]: Build Mooncake Store component-DWITH_P2P_STORE=[ON|OFF]: Enable Golang support and build P2P Store component, require go 1.23+-DWITH_RUST_EXAMPLE=[ON|OFF]: Build the Transfer Engine Rust interface and sample code. Default: OFF.-DWITH_STORE_RUST=[ON|OFF]: Build Mooncake Store Rust bindings and CMake Rust targets. Default: ON.-DWITH_EP=[ON|OFF]: Build the EP (Expert Parallelism) and PG Python extensions for CUDA. Requires CUDA toolkit and PyTorch. Use-DEP_TORCH_VERSIONS="2.9.1"(semicolon-separated) to build for specific PyTorch versions, or leave empty to use the currently-installed torch. The CUDA version is detected automatically. Default: OFF.-DUSE_REDIS=[ON|OFF]: Enable Redis-based metadata service-DUSE_HTTP=[ON|OFF]: Enable Http-based metadata service-DUSE_ETCD=[ON|OFF]: Enable etcd-based metadata service, require go 1.23+-DSTORE_USE_ETCD=[ON|OFF]: Enable etcd-based failover for Mooncake Store, require go 1.23+. Note:-DUSE_ETCDand-DSTORE_USE_ETCDare two independent options. Enabling-DSTORE_USE_ETCDdoes not depend on-DUSE_ETCD-DBUILD_SHARED_LIBS=[ON|OFF]: Build Transfer Engine as shared library, default is OFF-DBUILD_UNIT_TESTS=[ON|OFF]: Build unit tests, default is ON-DBUILD_EXAMPLES=[ON|OFF]: Build examples, default is ON