Background: Cine Displacement Encoding with Stimulated Echoes (DENSE) facilitates the quantification of myocardial deformation, by encoding tissue displacements in the cardiovascular magnetic resonance (CMR) image phase, from which myocardial strain can be estimated with high accuracy and reproducibility. Current methods for analyzing DENSE images still heavily rely on user input, making this process time-consuming and subject to inter-observer variability. The present study sought to develop a spatio-temporal deep learning model for segmentation of the left-ventricular (LV) myocardium, as spatial networks often fail due to contrast-related properties of DENSE images. Methods: 2D + time nnU-Net-based models have been trained to segment the LV myocardium from DENSE magnitude data in short- and long-axis images. A dataset of 360 short-axis and 124 long-axis slices was used to train the networks, from a combination of healthy subjects and patients with various conditions (hypertrophic and dilated cardiomyopathy, myocardial infarction, myocarditis). Segmentation performance was evaluated using ground-truth manual labels, and a strain analysis using conventional methods was performed to assess strain agreement with manual segmentation. Additional validation was performed using an externally acquired dataset to compare the inter- and intra-scanner reproducibility with respect to conventional methods. Results: Spatio-temporal models gave consistent segmentation performance throughout the cine sequence, while 2D architectures often failed to segment end-diastolic frames due to the limited blood-to-myocardium contrast. Our models achieved a DICE score of 0.83 ± 0.05 and a Hausdorff distance of 4.0 ± 1.1 mm for short-axis segmentation, and 0.82 ± 0.03 and 7.9 ± 3.9 mm respectively for long-axis segmentations. Strain measurements obtained from automatically estimated myocardial contours showed good to excellent agreement with manual pipelines, and remained within the limits of inter-user variability estimated in previous studies. Conclusion: Spatio-temporal deep learning shows increased robustness for the segmentation of cine DENSE images. It provides excellent agreement with manual segmentation for strain extraction. Deep learning will facilitate the analysis of DENSE data, bringing it one step closer to clinical routine.