"
Many of the modern networks used to interconnect nodes in cluster-based computing systems provide network interface cards (NICs) that offer programmable processors. Much research has been done with the focus of offloading processing from the host to the NIC processor. However, the research has mainly focused on offloading ad-hoc features to the NIC, mainly to support the optimization of common collective and synchronization-based communications. In this paper, we describe the design and implementation of a new framework based on MPICH-GM to support the dynamic NIC-based offload of user-defined modules for Myrinet clusters. We evaluate our implementation on a 16-node cluster using a NIC-based version of the common broadcast operation and we find a factor of improvement of up to nearly 1.3 for our implementation versus the standard host-based implementation of broadcast. In addition, we see that this factor of improvement increases with system size, indicating that our implementation is more scalable than the default host-based approach. To the best of our knowledge, this is the first such study for Myrinet clusters.