Benchmark performance on NVIDIA A10#
Here are some preview MooncakeStore benchmark results on A10 with “Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4”.
Varying PD ratio (input length = 1024, qps = 2, output length =6, num of requests = 200)#
Configuration |
Backend |
Duration (s) |
Output Token Throughput (tok/s) |
Total Token Throughput (tok/s) |
Mean TTFT (ms) |
Median TTFT (ms) |
P99 TTFT (ms) |
Mean TPOT (ms) |
Median TPOT (ms) |
P99 TPOT (ms) |
Mean ITL (ms) |
Median ITL (ms) |
P99 ITL (ms) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2P2D tp = 1 |
Redis |
99.47 |
12.06 |
2042.75 |
844.28 |
666.84 |
2270.91 |
16.88 |
11.57 |
104.83 |
16.84 |
11.56 |
239.67 |
MooncakeStore (TCP) |
99.44 |
12.07 |
2043.30 |
817.43 |
639.48 |
1969.89 |
12.49 |
11.55 |
45.52 |
12.46 |
11.55 |
15.31 |
|
MooncakeStore (RDMA) |
99.33 |
12.08 |
2045.57 |
763.58 |
604.22 |
2030.34 |
12.43 |
11.53 |
43.02 |
12.39 |
11.52 |
15.40 |
|
2P2D tp = 2 |
Redis |
98.92 |
12.13 |
2054.12 |
397.20 |
352.37 |
782.44 |
9.00 |
8.05 |
36.06 |
8.97 |
8.03 |
13.94 |
MooncakeStore (TCP) |
98.81 |
12.14 |
2056.43 |
327.91 |
309.38 |
573.36 |
8.33 |
8.04 |
17.62 |
8.30 |
8.03 |
11.23 |
|
MooncakeStore (RDMA) |
98.74 |
12.15 |
2057.79 |
271.25 |
250.11 |
532.00 |
8.34 |
8.12 |
14.70 |
8.31 |
8.10 |
11.12 |
|
3P3D (1 remote P, 1 remote D) tp = 2 qps = 2 |
Redis |
98.89 |
12.13 |
2054.80 |
382.73 |
358.18 |
659.31 |
8.07 |
8.02 |
8.86 |
8.04 |
7.99 |
10.14 |
MooncakeStore (TCP) |
98.71 |
12.16 |
2058.47 |
298.71 |
302.74 |
512.84 |
8.06 |
8.04 |
8.54 |
8.03 |
8.02 |
8.88 |
|
MooncakeStore (RDMA) |
98.69 |
12.16 |
2058.88 |
269.73 |
252.96 |
543.38 |
8.13 |
8.04 |
10.49 |
8.10 |
8.02 |
11.29 |
|
4P2D (2 remote P) tp = 2 |
Redis |
98.85 |
12.14 |
2055.66 |
350.39 |
339.15 |
506.78 |
8.54 |
8.01 |
26.42 |
8.51 |
7.99 |
11.43 |
MooncakeStore (TCP) |
98.76 |
12.15 |
2057.56 |
312.32 |
307.50 |
475.79 |
8.29 |
8.03 |
19.87 |
8.25 |
8.01 |
9.51 |
|
MooncakeStore (RDMA) |
98.71 |
12.16 |
2058.59 |
259.87 |
251.23 |
461.96 |
8.20 |
8.05 |
10.20 |
8.17 |
8.03 |
11.50 |
|
2P4D (2 remote D) tp = 2 |
Redis |
98.88 |
12.14 |
2054.90 |
381.91 |
338.25 |
722.00 |
8.07 |
8.05 |
8.55 |
8.04 |
8.02 |
9.15 |
MooncakeStore (TCP) |
98.78 |
12.15 |
2057.11 |
317.42 |
304.53 |
521.66 |
8.07 |
8.03 |
8.75 |
8.04 |
8.02 |
9.62 |
|
MooncakeStore (RDMA) |
98.73 |
12.15 |
2058.02 |
275.13 |
251.57 |
487.43 |
8.18 |
8.06 |
9.19 |
8.15 |
8.05 |
10.53 |