Welcome to Mooncake#

A KVCache-centric Disaggregated Architecture for LLM Serving.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. Now both the Transfer Engine and Mooncake Store are open-sourced! This repository also hosts its technical report and the open sourced traces.
🔄 Updates
May 9, 2025: NIXL officially supports Mooncake Transfer Engine as a backend plugin.
May 8, 2025: Mooncake x LMCache unite to pioneer KVCache-centric LLM serving system.
May 5, 2025: Supported by Mooncake Team, SGLang release guidance to deploy DeepSeek with PD Disaggregation on 96 H100 GPUs.
Apr 22, 2025: LMCache officially supports Mooncake Store as a remote connector.
Apr 10, 2025: SGLang officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.
Mar 7, 2025: We open sourced the Mooncake Store, a distributed KVCache based on Transfer Engine. vLLM’s xPyD disaggregated prefilling & decoding based on Mooncake Store will be released soon.
Feb 25, 2025: Mooncake receives the Best Paper Award at FAST 2025!
Feb 21, 2025: The updated traces used in our FAST’25 paper have been released.
Dec 16, 2024: vLLM officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.
Nov 28, 2024: We open sourced the Transfer Engine, the central component of Mooncake. We also provide two demonstrations of Transfer Engine: a P2P Store and vLLM integration.
July 9, 2024: We open sourced the trace as a jsonl file.
June 27, 2024: We present a series of Chinese blogs with more discussions on zhihu 1, 2, 3, 4.
June 26, 2024: Initial technical report release.
Documentation#
Getting Started
Performance
Design Documents
Troubleshooting