Abstract:
Convolutional neural networks (CNNs) are important methods in deep learning. They have presented up-to-date performance in different challenging areas, such as natural language processing and computer vision. These powerful and efficient neural networks implement training slowly under a massive number of network training parameters. The primary challenge is to reduce the training time for large volumetric data. CNNs have a small number of parameters at convolutional layers and a large number of parameters at fully connected layers. Training time can be reduced by the use of computing parallelism according to the characteristics of CNN layers. This paper presents an optimized parallelism algorithm using a communication strategy for CNN training in distributed graphic processing units (GPUs). We use the butterfly reduction communication strategy and apply data and model parallelisms at convolutional and fully connected layers respectively. A model is divided among distributed GPUs, and each division of the model works according to the characteristics of CNNs. This hybrid parallel approach is more desirable than previous parallelism alternatives, such as data parallelism only and model parallelism alone, which have been applied to modern CNNs. Experimental results reveal that this parallel approach with butterfly communication strategy can enhance accuracy and decrease training time.