Allocator Performance#
We evaluated the performance of OffsetAllocator, the default memory allocator in Mooncake Store. This allocator is responsible for allocating memory from mounted segments to store the KV cache.
In this context, the most important metric is memory utilization, defined as the ratio between the amount of memory that can be successfully allocated and the total available memory. A higher utilization means that more KV tensors can be cached, thereby accelerating LLM tasks. However, due to memory fragmentation, allocation may fail even when the allocated memory is well below the total available capacity.
For the same allocator, memory utilization can vary significantly under different workloads. Therefore, we evaluated the allocator’s efficiency across a range of workloads.
In particular, in the LLM inference scenario, once the model is fixed, the size of each KV vector is also fixed. This means memory utilization under uniform allocation sizes becomes especially important. However, the original OffsetAllocator has a limitation: when the allocation size is uniform but does not match any of OffsetAllocator’s predefined bin sizes, memory utilization can be suboptimal.
To address this, we introduced targeted optimizations for uniform-size workloads on top of the original OffsetAllocator. As shown in our test results, the optimized version achieves significant performance improvements in such scenarios.
Execution#
./mooncake-store/benchmarks/allocator_bench
Result#
alloc size: The size of each object
utilization ratio: The total allocated size / total space
time: time in nanoseconds for each object allocation
OffsetAllocator optimization: round up the allocated size to a bin size.
Uniform size, size equals power of 2#
OffsetAllocator (After Optimization)
Alloc Size |
Min Util Ratio |
Avg Util Ratio |
Time (ns) |
---|---|---|---|
32 |
1 |
1 |
544 |
128 |
1 |
1 |
417 |
512 |
1 |
1 |
174 |
2048 |
1 |
1 |
406 |
8192 |
1 |
1 |
180 |
32768 |
1 |
1 |
133 |
131072 |
1 |
1 |
109 |
524288 |
1 |
1 |
100 |
2097152 |
1 |
1 |
99 |
8388608 |
1 |
1 |
99 |
33554432 |
1 |
1 |
98 |
OffsetAllocator (Before Optimization)
Alloc Size |
Min Util Ratio |
Avg Util Ratio |
Time (ns) |
---|---|---|---|
32 |
1 |
1 |
539 |
128 |
1 |
1 |
419 |
512 |
1 |
1 |
217 |
2048 |
1 |
1 |
408 |
8192 |
1 |
1 |
175 |
32768 |
1 |
1 |
130 |
131072 |
1 |
1 |
107 |
524288 |
1 |
1 |
99 |
2097152 |
1 |
1 |
100 |
8388608 |
1 |
1 |
98 |
33554432 |
1 |
1 |
98 |
Uniform size, size equals power of 2 +/- 17#
OffsetAllocator (After Optimization)
Alloc Size |
Min Util Ratio |
Avg Util Ratio |
Time (ns) |
---|---|---|---|
15 |
1 |
1 |
568 |
111 |
0.991071 |
0.991071 |
441 |
495 |
0.966797 |
0.966797 |
178 |
2031 |
0.991699 |
0.991699 |
418 |
8175 |
0.997925 |
0.997925 |
170 |
32751 |
0.999481 |
0.999481 |
133 |
131055 |
0.99987 |
0.99987 |
109 |
524271 |
0.999968 |
0.999968 |
100 |
2097135 |
0.999992 |
0.999992 |
99 |
8388591 |
0.999998 |
0.999998 |
98 |
33554415 |
0.999999 |
0.999999 |
99 |
49 |
0.942308 |
0.942308 |
508 |
145 |
0.906249 |
0.906249 |
372 |
529 |
0.918399 |
0.918399 |
172 |
2065 |
0.896267 |
0.896267 |
403 |
8209 |
0.89073 |
0.89073 |
174 |
32785 |
0.889347 |
0.889347 |
131 |
131089 |
0.88897 |
0.88897 |
105 |
524305 |
0.888701 |
0.888701 |
102 |
2097169 |
0.888679 |
0.888679 |
100 |
8388625 |
0.886721 |
0.886721 |
100 |
33554449 |
0.875 |
0.875 |
100 |
OffsetAllocator (Before Optimization)
Alloc Size |
Min Util Ratio |
Avg Util Ratio |
Time (ns) |
---|---|---|---|
15 |
1 |
1 |
566 |
111 |
0.669866 |
0.710845 |
703 |
495 |
0.665779 |
0.676874 |
238 |
2031 |
0.668333 |
0.705411 |
637 |
8175 |
0.666175 |
0.676474 |
242 |
32751 |
0.664435 |
0.669078 |
168 |
131055 |
0.66062 |
0.667341 |
124 |
524271 |
0.653055 |
0.666993 |
118 |
2097135 |
0.64062 |
0.666873 |
116 |
8388591 |
0.605468 |
0.667812 |
115 |
33554415 |
0.5625 |
0.670944 |
116 |
49 |
0.692229 |
0.753062 |
1122 |
145 |
0.667789 |
0.700907 |
572 |
529 |
0.66577 |
0.676238 |
238 |
2065 |
0.667926 |
0.704884 |
632 |
8209 |
0.665708 |
0.676372 |
239 |
32785 |
0.664224 |
0.669058 |
168 |
131089 |
0.659631 |
0.667287 |
129 |
524305 |
0.652609 |
0.666884 |
122 |
2097169 |
0.638677 |
0.666516 |
120 |
8388625 |
0.60547 |
0.665131 |
121 |
33554449 |
0.546875 |
0.660917 |
120 |
Uniform size, size equals power of 2 multiply 0.9 or 1.1#
OffsetAllocator (After Optimization)
Alloc Size |
Min Util Ratio |
Avg Util Ratio |
Time (ns) |
---|---|---|---|
28 |
1 |
1 |
543 |
115 |
0.958333 |
0.958333 |
418 |
460 |
0.958332 |
0.958332 |
189 |
1843 |
0.959896 |
0.959896 |
418 |
7372 |
0.959895 |
0.959895 |
197 |
29491 |
0.959993 |
0.959993 |
135 |
117964 |
0.959979 |
0.959979 |
111 |
471859 |
0.959985 |
0.959985 |
100 |
1887436 |
0.959765 |
0.959765 |
99 |
7549747 |
0.959766 |
0.959766 |
99 |
30198988 |
0.95625 |
0.95625 |
99 |
35 |
0.972222 |
0.972222 |
531 |
140 |
0.972222 |
0.972222 |
397 |
563 |
0.977427 |
0.977427 |
180 |
2252 |
0.97743 |
0.97743 |
389 |
9011 |
0.977752 |
0.977752 |
183 |
36044 |
0.977752 |
0.977752 |
133 |
144179 |
0.977739 |
0.977739 |
106 |
576716 |
0.977538 |
0.977538 |
103 |
2306867 |
0.977539 |
0.977539 |
99 |
9227468 |
0.975391 |
0.975391 |
99 |
36909875 |
0.9625 |
0.9625 |
100 |
OffsetAllocator (Before Optimization)
Alloc Size |
Min Util Ratio |
Avg Util Ratio |
Time (ns) |
---|---|---|---|
28 |
1 |
1 |
539 |
115 |
0.669299 |
0.709245 |
701 |
460 |
0.665825 |
0.677532 |
255 |
1843 |
0.669352 |
0.709202 |
691 |
7372 |
0.66619 |
0.677401 |
260 |
29491 |
0.664311 |
0.669511 |
172 |
117964 |
0.661812 |
0.667356 |
133 |
471859 |
0.654345 |
0.667048 |
123 |
1887436 |
0.640722 |
0.666447 |
121 |
7549747 |
0.611719 |
0.666847 |
119 |
30198988 |
0.548437 |
0.669799 |
125 |
35 |
0.7098 |
0.774162 |
1306 |
140 |
0.667934 |
0.702151 |
599 |
563 |
0.665599 |
0.675548 |
239 |
2252 |
0.667371 |
0.701623 |
601 |
9011 |
0.665485 |
0.675528 |
244 |
36044 |
0.663248 |
0.668912 |
170 |
144179 |
0.660308 |
0.666934 |
127 |
576716 |
0.654467 |
0.66679 |
122 |
2306867 |
0.633789 |
0.666159 |
121 |
9227468 |
0.597266 |
0.666037 |
118 |
36909875 |
0.55 |
0.669564 |
121 |
Random Size#
OffsetAllocator (After Optimization)
util ratio (min / p99 / p90 / p50 / max / avg):
0.544250 / 0.713338 / 0.779739 / 0.847867 / 0.952591 / 0.841576
avg alloc time: 145.575738 ns/op
OffsetAllocator (Before Optimization)
util ratio (min / p99 / p90 / p50 / max / avg):
0.569255 / 0.712076 / 0.781224 / 0.855046 / 0.976057 / 0.848873
avg alloc time: 142.508508 ns/op