Mooncake Store Deployment & Operations Guide#

This page summarizes useful flags, environment variables, and HTTP endpoints to help advanced users tune Mooncake Master and observe metrics.

Master Startup Flags (with defaults)#

  • RPC Related

    • --rpc_port (int, default 50051): RPC listen port.

    • --rpc_thread_num (int, default min(4, CPU cores)): RPC worker threads. If not set, uses --max_threads (default 4) capped by CPU cores.

    • --rpc_address (str, default 0.0.0.0): RPC bind address.

    • --rpc_conn_timeout_seconds (int, default 0): RPC idle connection timeout; 0 disables.

    • --rpc_enable_tcp_no_delay (bool, default true): Enable TCP_NODELAY.

  • Metrics

    • --enable_metric_reporting (bool, default true): Periodically log master metrics to INFO.

    • --metrics_port (int, default 9003): HTTP port for /metrics endpoints.

  • HTTP Metadata Server For Mooncake Transfer Engine

    • --enable_http_metadata_server (bool, default false): Enable embedded HTTP metadata server.

    • --http_metadata_server_host (str, default 0.0.0.0): Metadata bind host.

    • --http_metadata_server_port (int, default 8080): Metadata TCP port.

  • Eviction and TTLs

    • --default_kv_lease_ttl (uint64, default 5000 ms): Default lease TTL for KV objects.

    • --default_kv_soft_pin_ttl (uint64, default 1800000 ms): Soft pin TTL (30 minutes).

    • --allow_evict_soft_pinned_objects (bool, default true): Allow evicting soft-pinned objects.

    • --eviction_ratio (double, default 0.05): Fraction evicted when hitting high watermark.

    • --eviction_high_watermark_ratio (double, default 0.95): Usage ratio to trigger eviction.

  • High Availability (optional)

    • --enable_ha (bool, default false): Enable HA (requires etcd).

    • --etcd_endpoints (str, default empty unless HA config): etcd endpoints, semicolon separated.

    • --client_ttl (int64, default 10 s): Client alive TTL after last ping (HA mode).

    • --cluster_id (str, default mooncake_cluster): Cluster ID for persistence in HA mode.

Example (enable embedded HTTP metadata and metrics):

mooncake_master \
  --enable_http_metadata_server=true \
  --http_metadata_server_host=0.0.0.0 \
  --http_metadata_server_port=8080 \
  --rpc_thread_num=64 \
  --metrics_port=9003 \
  --enable_metric_reporting=true

Metrics Endpoints#

The master exposes Prometheus-style metrics over HTTP on --metrics_port:

  • GET /metrics — Prometheus format (text/plain; version=0.0.4).

  • GET /metrics/summary — Human-readable summary.

Examples:

curl -s http://<master_host>:9003/metrics
curl -s http://<master_host>:9003/metrics/summary

Client/Engine Tuning (Env Vars, with defaults)#

  • Topology discovery (Store Client → Transfer Engine)

    • MC_MS_AUTO_DISC (default 1): Auto-discover NIC/GPU topology. Set 0 to disable and provide rdma_devices manually.

    • MC_MS_FILTERS (default empty): Optional comma-separated NIC whitelist when auto-discovery is enabled (e.g., mlx5_0,mlx5_2).

    • If MC_MS_AUTO_DISC=0, pass rdma_devices (comma-separated) to the Python setup(...) call.

  • Transfer Engine metrics (disabled by default)

    • MC_TE_METRIC (default 0/unset): Set to 1 to enable periodic engine metrics logging.

    • MC_TE_METRIC_INTERVAL_SECONDS (default 5): Positive integer seconds between reports (effective only if metrics enabled).

  • Client metrics (enabled by default)

    • MC_STORE_CLIENT_METRIC (default 1): Client-side metrics on by default; set 0 to disable entirely.

    • MC_STORE_CLIENT_METRIC_INTERVAL (default 0): Reporting interval in seconds; 0 collects but does not periodically report.

  • Local memcpy optimization (Store transfer path)

    • MC_STORE_MEMCPY (default 0/false): Set to 1 to prefer local memcpy when source/destination are on the same client.

Quick Tips#

  • Scale --rpc_thread_num with available CPU cores and workload.

  • Start with default eviction settings; adjust --eviction_high_watermark_ratio and --eviction_ratio based on memory pressure and object churn.

  • Use /metrics/summary during bring-up; integrate /metrics with Prometheus/Grafana for production.