Welcome to Mooncake

Contents

Welcome to Mooncake#

Mooncake

A KVCache-centric Disaggregated Architecture for LLM Serving.

Star Watch Fork

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. Now both the Transfer Engine and Mooncake Store are open-sourced! This repository also hosts its technical report and the open sourced traces.

🔄 Updates

  • May 9, 2025: NIXL officially supports Mooncake Transfer Engine as a backend plugin.

  • May 8, 2025: Mooncake x LMCache unite to pioneer KVCache-centric LLM serving system.

  • May 5, 2025: Supported by Mooncake Team, SGLang release guidance to deploy DeepSeek with PD Disaggregation on 96 H100 GPUs.

  • Apr 22, 2025: LMCache officially supports Mooncake Store as a remote connector.

  • Apr 10, 2025: SGLang officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.

  • Mar 7, 2025: We open sourced the Mooncake Store, a distributed KVCache based on Transfer Engine. vLLM’s xPyD disaggregated prefilling & decoding based on Mooncake Store will be released soon.

  • Feb 25, 2025: Mooncake receives the Best Paper Award at FAST 2025!

  • Feb 21, 2025: The updated traces used in our FAST’25 paper have been released.

  • Dec 16, 2024: vLLM officially supports Mooncake Transfer Engine for disaggregated prefilling and KV cache transfer.

  • Nov 28, 2024: We open sourced the Transfer Engine, the central component of Mooncake. We also provide two demonstrations of Transfer Engine: a P2P Store and vLLM integration.

  • July 9, 2024: We open sourced the trace as a jsonl file.

  • June 27, 2024: We present a series of Chinese blogs with more discussions on zhihu 1, 2, 3, 4.

  • June 26, 2024: Initial technical report release.

Documentation#