چكيده به لاتين
Multithreaded high-performance computing applications require frequent accesses to memory causing high frequency of allocations/deallocations of memory blocks at runtime. Small size memory pages incur high cost of memory accesses, requiring multiple levels of memory address translations, and high number of kernel calls to acquire/release virtual memory. It is thus necessary to leverage hardware huge pages not only to compensate these negative impacts but also to get beyond Exaflops performance. To properly cater for high performance requirements of such applications, the heap manager in the runtime system must split huge pages into smaller size pages such that threads do not get blocked because of huge page allocations and multiple threads can share memory blocks. To this end, this paper presents HuMalloc in support of such applications by managing the heap via three components, namely back-end, central free-list and front-end, responsible for interacting with the kernel, sharing memory blocks between threads,
and cachi1ng each thread’s released memory block, respectively, removing unnecessary locks at the back-end and efficiently splitting huge pages into fixe size memory blocks. Experiments
show that HuMalloc improves the performance of the Larson benchmark up to 26%, 145%, and 166%, compared to TcMalloc, MiMalloc, and RpMalloc, respectively. Besides, by executing The SchedSim application, which is a simulation of HPC applications based on SimGrid, HuMalloc compared to TcMalloc and showed up to 24% efficiency improvement.