Supported Communication Protocols#

Mooncake Transfer Engine supports multiple communication protocols for data transfer between nodes in a cluster. The protocol selection depends on your hardware capabilities and performance requirements.

Quick Reference#

Protocol

Hardware Required

Use Case

Python API Support

tcp

Standard network

General purpose, works everywhere

✅ Primary

rdma

RDMA-capable NIC

High-performance, low-latency

✅ Primary

nvmeof

NVMe-oF capable storage

Direct NVMe storage access

⚠️ Advanced

nvlink

NVIDIA MNNVL

Inter-node GPU communication

⚠️ Advanced

nvlink_intra

NVIDIA NVLink

Intra-node GPU communication

⚠️ Advanced

hip

AMD ROCm/HIP

AMD GPU communication

⚠️ Advanced

barex

RDMA-capable NIC

Bare-metal RDMA extension

⚠️ Advanced

cxl

CXL-capable hardware

Memory pooling and sharing

⚠️ Advanced

ascend

Huawei Ascend NPU

Ascend NPU communication

⚠️ Advanced

Commonly Used Protocols (Python API)#

TCP (Default)#

Description: Standard TCP/IP network protocol.

Use When:

  • No special hardware is available

  • Testing or development environments

  • Compatibility is more important than performance

Configuration:

# Python API
engine.initialize(
    hostname="localhost",
    metadata_server="P2PHANDSHAKE",
    protocol="tcp",  # No device_name needed
    device_name=""
)
# Environment variables
export MOONCAKE_PROTOCOL="tcp"

Advantages:

  • Works in all environments

  • No special hardware required

  • Simple setup

Limitations:

  • Lower throughput compared to RDMA

  • Higher CPU overhead

  • Higher latency

Advanced Protocols (C++ Transfer Engine)#

The following protocols are available at the C++ Transfer Engine level for specialized use cases. They are not commonly used through the Python API.

NVMe over Fabric (nvmeof)#

Description: Direct data transfer between NVMe storage and DRAM/VRAM using GPUDirect Storage, bypassing the CPU for zero-copy operations.

Use When:

  • Direct NVMe storage access is needed

  • Implementing multi-tier storage (DRAM/VRAM/NVMe)

  • Working with large datasets that don’t fit in memory

Requirements:

  • NVMe-oF capable storage

  • Properly mounted remote storage nodes

HIP Transport (hip)#

Description: AMD ROCm/HIP transport for GPU communication using IPC handles or Shareable handles.

Use When:

  • Working with AMD GPUs

  • Need intra-node GPU communication on AMD hardware

Requirements:

  • AMD ROCm/HIP runtime

  • AMD GPUs

Barex Transport (barex)#

Description: Bare-metal RDMA extension protocol for specialized RDMA configurations.

Use When:

  • Advanced RDMA features are required

  • Custom RDMA configurations

Requirements:

  • RDMA-capable hardware

  • Specialized configuration

CXL Transport (cxl)#

Description: Compute Express Link for memory pooling and sharing across devices.

Use When:

  • CXL memory pooling is available

  • Memory disaggregation is needed

Requirements:

  • CXL-capable hardware

Ascend Transport (ascend)#

Description: Huawei Ascend NPU communication using HCCL (Huawei Collective Communication Library) or direct transport.

Use When:

  • Working with Huawei Ascend NPUs

  • Distributed inference on Ascend hardware

Requirements:

  • Huawei Ascend NPU hardware

  • HCCL runtime

Documentation:

Configuration Examples#

Configuration File (JSON)#

TCP Configuration:

{
    "local_hostname": "localhost",
    "metadata_server": "localhost:8080",
    "protocol": "tcp",
    "device_name": "",
    "master_server_address": "localhost:8081"
}

RDMA Configuration:

{
    "local_hostname": "node1",
    "metadata_server": "etcd://10.0.0.1:2379",
    "global_segment_size": "3GB",
    "local_buffer_size": "1GB",
    "protocol": "rdma",
    "device_name": "mlx5_0",
    "master_server_address": "10.0.0.1:8081"
}

Environment Variables#

# TCP (Default)
export MOONCAKE_PROTOCOL="tcp"

# RDMA with specific device
export MOONCAKE_PROTOCOL="rdma"
export MOONCAKE_DEVICE="mlx5_0"

# RDMA with auto-discovery
export MOONCAKE_PROTOCOL="rdma"
export MOONCAKE_DEVICE="auto-discovery"

# Other configuration
export MOONCAKE_MASTER="10.0.0.1:50051"
export MOONCAKE_TE_META_DATA_SERVER="P2PHANDSHAKE"
export MOONCAKE_LOCAL_HOSTNAME="node1"

Choosing the Right Protocol#

Scenario

Recommended Protocol

Notes

Development/Testing

tcp

Simple setup, no special hardware

Production Inference

rdma

Best performance and latency

Cloud Environments

tcp or rdma (if available)

Check cloud provider support

Multi-tier Storage

rdma + nvmeof

Combine protocols for different layers

AMD GPU Clusters

rdma + hip

Use HIP for local GPU communication

Ascend NPU Clusters

rdma + ascend

Use Ascend for NPU-specific operations

Troubleshooting#

RDMA Connection Issues#

  1. Check RDMA devices:

    ibv_devices
    ibv_devinfo
    
  2. Verify network connectivity:

    # Test RDMA connectivity (requires rdma-core tools)
    rping -s  # On server
    rping -c -a <server_ip> -v  # On client
    
  3. Check permissions:

    • RDMA may require elevated permissions

    • Run with sudo if necessary

    • Configure proper udev rules for non-root access

  4. Firewall configuration:

    • Ensure RDMA ports are not blocked

    • Check InfiniBand subnet manager is running

Protocol Selection#

If a protocol fails to initialize:

  1. Verify hardware support

  2. Check that required drivers are installed

  3. Ensure compile-time flags are set correctly (for C++ protocols)

  4. Fall back to TCP for basic functionality

See Also#