IEEE SBAC-PAD '20Thomas, Samuel & Hayne, Roxana & Pulaj, Jonad & Mendes, Hammurabi. (2020). Using Skip Graphs for Increased NUMA Locality. 157-166. 10.1109/SBAC-PAD49847.2020.00031.
Abstract. High-performance simulations and parallel frameworks often rely on highly scalable, concurrent data structures for system scalability. With an increased availability of NUMA architectures, we present a technique to promote NUMA-aware data parallelism inside a concurrent data structure, bringing significant quantitative and qualitative improvements on NUMA locality, as well as reduced contention for synchronized memory accesses. Our architecture is based on a data-partitioned, concurrent skip graph indexed by thread-local sequential maps. We implemented maps and relaxed priority queues using such technique. Maps show up to 6x higher CAS locality, up to a 68.6% reduction on the number of remote CAS operations, and an increase from 88.3% to 99% on the CAS success rate compared to a control implementation (subject to the same optimizations, and implementation practices). Remote memory accesses are not only reduced in number, but the larger the NUMA distance between threads, the larger the reduction is. Relaxed priority queues implemented using our technique show similar scalability improvements, with provable reduction in contention and decrease in relaxation in one of our implementations.
ACM PODC '19, short paperSamuel Thomas and Hammurabi Mendes. 2019. Layering Data Structures over Skip Graphs for Increased NUMA Locality. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC '19). Association for Computing Machinery, New York, NY, USA, 422–424. DOI:https://doi.org/10.1145/3293611.3331576
Abstract. We present a lock-free, linearizable, and NUMA-aware data structure that implements sets, maps, and priority queue abstract data types (ADTs), based on using thread-local, sequential maps that are used to "jump" to suitable positions in a lock-free, linearizable variant of a skip graph. Our skip graph is suitably constrained in height and subjected to a data partition scheme that reduces contention and increases NUMA locality. We developed an additional skip graph variant, which we call sparse skip graph, that causes our thread-local maps as well as our shared structure to become more sparse. Compared to using regular skip graphs, sparse skip graphs show increased performance in workloads dominated by "insert" or "remove" operations, and comparable performance in workloads dominated by "contains" operations.