Time-Frequency Distributions (TFDs) support the heart sound characterisation and classification in early cardiac screening. However, despite the frequent use of TFDs in signal analysis, no comprehensive study has been conducted to compare their performances in deep learning for automatic diagnosis. This study is the first to investigate and compare the optimal use of single/combined TFDs for heart sound classification using deep learning. The main contribution of this study is that it provides practical insights into the selection of TFDs as convolutional neural network (CNN) inputs and the design of CNN architecture for heart sound classification. The presented results revealed that: 1) The transformation of the heart sound signal into the TF domain achieves higher classification performance than using raw signal patterns as input. Overall, the difference in the performance was slight among the applied TFDs for all participated CNNs (within in MAcc (average of sensitivity and specificity)). However, continuous wavelet transform (CWT) and Chirplet transform (CT) outperformed the rest (surpassing by approximately in MAcc). 2) The appropriate increase of the CNN capacity and architecture optimisation can improve the performance, while the network architecture should not be overly complicated. Based on the results on ResNet or SEResNet, the increasing parameter number and the depth of the structure do not improve the performance apparently. 3) Combining TFDs as CNN inputs did not significantly improve the classification results. The results of this study provide valuable insights for researchers and practitioners in the field of automatic diagnosis of heart sounds with deep learning, particularly in selecting TFDs as CNN input and designing CNN architecture for heart sound classification.