Benchmark performance on NVIDIA A10#
Here are some preview mooncake benchmark results on A10 with up to 2 RDMA NICs. We are currently having some trouble benchmarking PyNcclConnector
now. For some unknown reasons, it crashes a lot for inter-node disaggregated scenarios. So the benchmark results haven’t included the PyNcclConnector
yet.
In addition, we are also coordinating resources to integrate some machines with more RDMA NICs and more advanced GPUs. The official benchmark results will be released in due time.
Varying tp (input length = 1024, qps = 2, output length =6)#
Setting |
num_rdma_nic |
Successful Requests |
Duration (s) |
Total Input Tokens |
Total Generated Tokens |
Req Throughput (req/s) |
Output Token Throughput (tok/s) |
Total Token Throughput (tok/s) |
Mean TTFT (ms) |
Median TTFT (ms) |
P99 TTFT (ms) |
Mean TPOT (ms) |
Median TPOT (ms) |
P99 TPOT (ms) |
Mean ITL (ms) |
Median ITL (ms) |
P99 ITL (ms) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
tp = 1 |
2 |
200 |
99.47 |
201995 |
1200 |
2.01 |
12.06 |
2042.74 |
1056.76 |
635.00 |
4006.59 |
97.08 |
26.94 |
781.91 |
97.01 |
14.05 |
2205.51 |
tp = 2 |
2 |
200 |
98.98 |
201995 |
1200 |
2.02 |
12.12 |
2052.95 |
314.87 |
231.20 |
949.40 |
25.65 |
15.56 |
129.60 |
25.62 |
15.48 |
288.06 |
tp = 4 |
2 |
200 |
98.76 |
201995 |
1200 |
2.03 |
12.15 |
2057.44 |
198.10 |
160.03 |
461.61 |
23.52 |
18.93 |
94.38 |
23.50 |
18.01 |
187.79 |
tp = 1 |
1 |
200 |
99.44 |
201995 |
1200 |
2.01 |
12.07 |
2043.39 |
1071.12 |
631.56 |
4361.02 |
83.93 |
26.93 |
794.75 |
83.86 |
14.13 |
1932.66 |
tp = 2 |
1 |
200 |
98.96 |
201995 |
1200 |
2.02 |
12.13 |
2053.35 |
335.26 |
258.30 |
997.93 |
28.84 |
15.56 |
144.82 |
28.80 |
15.42 |
397.56 |
tp = 4 |
1 |
200 |
98.78 |
201995 |
1200 |
2.02 |
12.15 |
2057.03 |
201.68 |
162.85 |
456.33 |
22.31 |
16.74 |
94.76 |
22.29 |
16.73 |
189.13 |
tp = 1 |
TCP |
200 |
99.55 |
201995 |
1200 |
2.01 |
12.05 |
2041.13 |
1414.05 |
766.23 |
6035.36 |
155.01 |
35.28 |
1191.24 |
154.91 |
14.32 |
3148.99 |
tp = 2 |
TCP |
200 |
98.97 |
201995 |
1200 |
2.02 |
12.12 |
2053.03 |
333.74 |
251.32 |
954.63 |
28.74 |
15.49 |
161.24 |
28.70 |
15.35 |
393.52 |
tp = 4 |
TCP |
200 |
98.78 |
201995 |
1200 |
2.02 |
12.15 |
2056.94 |
205.37 |
162.92 |
463.70 |
21.54 |
16.51 |
94.04 |
21.51 |
16.56 |
170.54 |
Varying qps (length = 1024, tp = 4, output length =6)#
Setting |
num_rdma_nic |
Successful Requests |
Duration (s) |
Total Input Tokens |
Total Generated Tokens |
Req Throughput (req/s) |
Output Token Throughput (tok/s) |
Total Token Throughput (tok/s) |
Mean TTFT (ms) |
Median TTFT (ms) |
P99 TTFT (ms) |
Mean TPOT (ms) |
Median TPOT (ms) |
P99 TPOT (ms) |
Mean ITL (ms) |
Median ITL (ms) |
P99 ITL (ms) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
qps = 2 |
2 |
200 |
98.77 |
201995 |
1200 |
2.02 |
12.15 |
2057.33 |
200.64 |
156.62 |
478.22 |
22.63 |
17.35 |
99.61 |
22.60 |
17.08 |
186.25 |
qps = 4 |
2 |
200 |
49.75 |
201995 |
1200 |
4.02 |
24.12 |
4084.03 |
341.88 |
240.68 |
1430.54 |
38.36 |
18.39 |
313.45 |
38.31 |
17.17 |
588.80 |
qps = 6 |
2 |
200 |
33.44 |
201995 |
1200 |
5.98 |
35.88 |
6075.54 |
851.15 |
501.59 |
3239.89 |
102.51 |
47.67 |
606.77 |
102.34 |
18.35 |
1704.79 |
qps = 8 |
2 |
200 |
27.16 |
201995 |
1200 |
7.36 |
44.19 |
7482.52 |
4835.08 |
5733.45 |
8846.27 |
1276.59 |
1150.11 |
4401.23 |
1274.43 |
48.34 |
20682.35 |
qps = 2 |
1 |
200 |
98.77 |
201995 |
1200 |
2.02 |
12.15 |
2057.31 |
201.77 |
161.53 |
473.44 |
22.13 |
16.52 |
96.18 |
22.11 |
16.51 |
190.40 |
qps = 4 |
1 |
200 |
49.76 |
201995 |
1200 |
4.02 |
24.12 |
4083.83 |
337.31 |
243.38 |
1395.85 |
39.95 |
17.61 |
325.39 |
39.88 |
17.06 |
838.68 |
qps = 6 |
1 |
200 |
33.44 |
201995 |
1200 |
5.98 |
35.88 |
6075.99 |
820.53 |
458.84 |
3169.52 |
83.92 |
30.50 |
663.07 |
83.78 |
17.85 |
1306.32 |
qps = 8 |
1 |
200 |
27.19 |
201995 |
1200 |
7.36 |
44.14 |
7473.44 |
5291.91 |
6160.55 |
9596.56 |
1190.36 |
1040.63 |
4418.66 |
1188.33 |
47.61 |
20815.23 |
qps = 2 |
TCP |
200 |
98.76 |
201995 |
1200 |
2.03 |
12.15 |
2057.42 |
207.22 |
160.81 |
511.01 |
22.17 |
16.59 |
94.96 |
22.15 |
16.59 |
181.82 |
qps = 4 |
TCP |
200 |
49.79 |
201995 |
1200 |
4.02 |
24.10 |
4081.06 |
355.43 |
252.63 |
1554.91 |
40.15 |
16.92 |
314.28 |
40.09 |
16.66 |
708.50 |
qps = 6 |
TCP |
200 |
33.49 |
201995 |
1200 |
5.97 |
35.83 |
6067.71 |
907.74 |
514.85 |
3253.93 |
122.75 |
45.51 |
648.40 |
122.56 |
18.09 |
2282.92 |
qps = 8 |
TCP |
200 |
28.39 |
201995 |
1200 |
7.04 |
42.26 |
7156.09 |
6714.57 |
7885.09 |
11787.51 |
1116.06 |
408.32 |
4645.25 |
1114.29 |
46.87 |
21898.03 |
Varying input length (tp = 4, qps = 2, output length =6)#
Setting |
num_rdma_nic |
Successful Requests |
Duration (s) |
Total Input Tokens |
Total Generated Tokens |
Req Throughput (req/s) |
Output Token Throughput (tok/s) |
Total Token Throughput (tok/s) |
Mean TTFT (ms) |
Median TTFT (ms) |
P99 TTFT (ms) |
Mean TPOT (ms) |
Median TPOT (ms) |
P99 TPOT (ms) |
Mean ITL (ms) |
Median ITL (ms) |
P99 ITL (ms) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1024 |
2 |
200 |
98.77 |
201995 |
1200 |
2.02 |
12.15 |
2057.32 |
195.47 |
151.55 |
482.84 |
22.83 |
19.27 |
96.55 |
22.81 |
18.12 |
158.16 |
2048 |
2 |
200 |
99.22 |
406707 |
1200 |
2.02 |
12.09 |
4110.95 |
723.76 |
488.67 |
2941.96 |
67.25 |
18.93 |
632.73 |
67.20 |
17.49 |
1209.54 |
4096 |
2 |
200 |
117.42 |
818415 |
1200 |
1.70 |
10.22 |
6979.90 |
14616.48 |
18323.82 |
23191.04 |
8042.84 |
7593.16 |
19851.11 |
8040.02 |
65.43 |
93511.26 |
8192 |
2 |
200 |
247.77 |
1636065 |
1200 |
0.81 |
4.84 |
6608.10 |
75783.36 |
79331.60 |
147544.42 |
16961.27 |
15140.11 |
39278.98 |
16958.32 |
90.01 |
186151.61 |
1024 |
1 |
200 |
98.77 |
201995 |
1200 |
2.02 |
12.15 |
2057.31 |
201.77 |
161.53 |
473.44 |
22.13 |
16.52 |
96.18 |
22.11 |
16.51 |
190.40 |
2048 |
1 |
200 |
99.25 |
406707 |
1200 |
2.02 |
12.09 |
4109.96 |
719.43 |
482.02 |
3208.13 |
61.92 |
17.64 |
681.26 |
61.86 |
16.83 |
978.90 |
4096 |
1 |
200 |
111.88 |
818415 |
1200 |
1.79 |
10.73 |
7326.16 |
20362.10 |
22807.05 |
31853.55 |
5915.16 |
4521.51 |
18739.12 |
5913.18 |
67.03 |
81600.29 |
8192 |
1 |
200 |
270.01 |
1636065 |
1200 |
0.74 |
4.44 |
6063.79 |
103355.40 |
106546.65 |
172025.11 |
12894.35 |
11027.66 |
35110.13 |
12892.85 |
64.84 |
151774.68 |
1024 |
TCP |
200 |
98.81 |
201995 |
1200 |
2.02 |
12.14 |
2056.44 |
203.32 |
160.83 |
460.90 |
21.81 |
16.96 |
95.27 |
21.78 |
16.91 |
171.80 |
2048 |
TCP |
200 |
99.27 |
406707 |
1200 |
2.01 |
12.09 |
4108.98 |
731.60 |
484.78 |
3213.69 |
68.55 |
17.88 |
639.93 |
68.49 |
17.33 |
1257.45 |
4096 |
TCP |
200 |
118.37 |
818415 |
1200 |
1.69 |
10.14 |
6923.89 |
23735.69 |
27101.97 |
36573.47 |
6386.62 |
5102.00 |
20032.26 |
6384.71 |
69.57 |
92811.27 |
8192 |
TCP |
200 |
278.12 |
1636065 |
1200 |
0.72 |
4.31 |
5886.95 |
106873.23 |
109941.33 |
179781.64 |
13360.87 |
12155.24 |
36022.96 |
13359.20 |
68.01 |
156716.38 |