Optimized Merge Sort on Modern Commodity Multi-core CPUs
Abstract: Sorting is a kind of
widely used basic algorithms. As the high performance computing devices are increasingly
common, more and more modern commodity machines have the capability of parallel
concurrent computing. A new implementation of sorting algorithms is proposed to
harness the power of newer SIMD operations and multi-core computing provided by
modern CPUs. The algorithm is hybrid byoptimized bitonic sorting network and
multi-way merge. New SIMD instructions provided by modern CPUs are used in the
bitonic network implementation, which adopted a different method to arrange
data so that the number of SIMD operations is reduced. Balanced binary trees
are used in multi-way merge, which is also different with former implementations.
Efforts are also paid on minimizing data moving in memory since merge sort is a
kind of memory-bound application. The performance evaluation shows that the proposed
algorithm is twice as fast as the sort function in C++ standard library when
only single thread is used. It also outperforms radix sort implemented in Boost
library.
Author: Ming Xu, Xianbin Xu,
MengJia Yin, Fang Zheng
Journal Code: jptkomputergg160215