Amazon SageMaker now supports Elastic Fabric Adapter for distributed training
Amazon SageMaker now supports Elastic Fabric Adapter (EFA) for training machine learning models. EFA is a network interface for Amazon EC2 instances that enables customers to run applications requiring high levels of inter-node communications at scale on AWS. EFA can significantly speed up distributed training on SageMaker at no additional cost. For example, we trained the BERT natural language processing model with SageMaker’s distributed data parallel library on 32 ml.p4d.24xlarge instances. The training was up to 130% faster with EFA compared to Elastic Network Adapter (ENA).