"
Programmable Network Interface Cards (NICs) have been leveraged to support efficient collective communication and synchronization. Previous studies have exploited NIC programmability to support efficient barrier, broadcast and reduce operations. This paper explores the design of NIC-based allgather with different algorithms. Along with these algorithms, salient strategies have been utilized to provide scalable topology management, global buffer management, efficient communication processing, as well as flow control and reliability. The resulting allgather algorithms have been incorporated into a NIC-based collective protocol over Myrinet/GM. Compared to the host-based allgather operation, over 16 nodes the NIC-based allgather operations improves allgather performance by a factor up to 3.01. Furthermore, the NIC-based allgather operations have better scalability to large systems and very little host CPU utilization. To the best of the authors' knowledge, this paper is the first in the literature to report efficient NIC-based allgather algorithms over Myrinet/GM.