Deep learning boosts the imaging speed of photoacoustic endomicroscopy

High-speed photoacoustic (PA) endomicroscopy imaging is desired for real-time guidance of minimally invasive surgery. However, the imaging speed of wavefront shaping-based endomicroscopy has been limited by the speed of spatial light modulators. In this work, a deep convolutional neural network was used to improve the imaging speed of a newly developed PA endomicroscopy system by enhancing sparsely sampled PA images. With a carbon fibre phantom, this method increased the imaging speed by 16 times without significantly affecting the image quality. With further validation on more complex datasets, this approach is promising to achieve real-time PA endomicroscopy imaging via wavefront shaping.


INTRODUCTION
Forward-viewing photoacoustic (PA) endoscopy attracted intensive interest due to its ability to provide 3D molecular and structural information of internal tissues with minimal tissue damage, which promises to be useful for guiding minimally invasive procedures such as tumour biopsy and fetal surgery 1,2 . Early studies involved a raster-scanned focused laser beam through a fibre bundle for exciting ultrasound from tissue in front its distal end 3 . Recently, the development of wavefront shaping technique enabled photoacoustic endomicroscopy imaging through multimode fibres (MMF) with higher lateral resolution and lower costs. However, the use of multimode fibres requires a spatial light modulator for modulating the incident optical wavefront, as such, the raster-scan-based imaging speed was limited by the rate of the modulator. The highest speed achieved in literature was ~3 frame per second (FPS) with the modulator operating at 22.7 kHz 4 . Each frame consisted of 7850 pixels, covering a 100 -in-diameter area. Higher imaging speed was desired for real-time imaging for clinical applications.
In the recent years, deep learning (DL) has been studied for improving the imaging speed of PA microscopy by enhancing the quality of a sparse scanned image. It could be implemented with an unsupervised DL model on a single image 5 or in a supervised way by training deep Convolutional Neural Networks (CNNs) on image pairs 6,7 . The former was proven efficient on reconstructing under sampled PA microscopy images of mouse vasculature acquired from a benchtop system. However, its performance was limited on PA endomicroscopy images that featured fewer scanning points and a smaller size for a faster imaging speed 8 . In this work, we developed a deep CNN model named PAE-EDSR (PA endomicroscopy enhanced-deep-super-resolution) via supervised learning for further enhancing the quality of the sparse PA endomicroscopy images dedicated to a high-speed MMF based PA endomicroscopy system. The proposed model was based on a state-of-art image super resolution model EDSR. A spatial attention (SA) residual block was employed within ResBlock modules for retaining high-frequency features 9 . The trained model demonstrated the superiority on sparse carbon fibre images, compared to the original EDSR and classical interpolation method bicubic, with the peak-to-noise-ratio (PSNR) of 28.68 dB using 6.25% effective pixels for recovery.

Photoacoustic endomicroscopy system
The all-optical photoacoustic endomicroscopy probe was described in our previous study 4 . Briefly, it comprised two adjacent optical fibres that were placed in the cannula of a 20-guage needle. A high-speed wavefront shaping algorithm developed by the authors' group, namely real-valued intensity transmission matrix 10 , was used to characterise a multimode fibre for raster-scanning a focused laser beam across the distal fibre tip. Whilst the ultrasound excited from imaging targets was detected by a fibre-optic ultrasound sensor based on a plane-concave microresonator at the tip of a single mode fibre 11 . The peak-to-peak intensity of the ultrasound signal at each scanning position was used to represent maximum intensity projections of photoacoustic microscopy images.

Network architecture and implementation
The proposed supervised model PAE-EDSR was tailored from the original EDSR 12 . As shown in Fig.1, the number of ResBlock units and size of convolutional filters at each layer were reduced to fit the dataset. SA was integrated into the residual block to module the residual features. This was achieved via spatial attention masks that generated by the operations of convolutions and activations. A total of 360 full-sampled PAE images (180 carbon fibres images of a size 200 × 200 and 180 mouse red blood cells images of a size 100 × 100) was prepared for model training.  The fully sampled PAE images were acquired by imaging carbon fibre phantoms and smear mouse red blood cells. The corresponding under sampled PAE images were generated by pixel-wisely multiplying binary under sampling masks with the fully sampled images. Noted that good consistency was observed between the generated under sampled images and images by sparse scanning. For quantitative evaluation, structural similarity index measure (SSIM) and PSNR were measured between the enhanced under sampled PAE images by classical interpolation method bicubic, the baseline model, PAE-EDSR and the corresponding fully sampled images, respectively.

Figs. 2 and 3
show representative results of recovering under-sampled PAE images of carbon fibres and mouse red blood cells using bicubic interpolation, the baseline model, and PAE-EDSR. Compared to bicubic interpolation and the original EDSR, the proposed model achieved the best or comparable performance for three sampling rates. For the largest sampling rate, as shown in the zoom-in images, the under-sampled images of carbon fibres contained the distorted line structures such as ragged edges. Bicubic interpolation restored smooth edges but introduced severe blurring that degraded the image quality. In comparison, DL-based methods can generate realistic boundaries with less blurring that close to the fully sampling results, which are also indicated by the improved SSIM and PSNR values (0.86 and 28.68, respectively). The proposed method also demonstrated reasonable enhancement on under-sampled PAE images of mouse red blood cells. The fully sampled images were centre cropped to remove the streak artefacts at the edges. Therefore, only a 2× sampling rate was applied. Visually speaking, the DL-based method can resolve the biconcave structure of the red blood cells with natural boundaries, as displayed in the zoom-in images and individual cells (denoted by green and white boxes). However, when compared with bicubic interpolation, less improvements or even slightly degradation was observed from the evaluation results using SSIM and PSNR. The best recovery was achieved by the baseline model with the SSIM and PSNR of 0.59 and 13.33 dB, respectively. It is worth noting that the signal-to-noise ratio of the fully sampled image is still suboptimal after denoising. High-frequency components, e.g., random noise in the background was barely recovered by both bicubic interpolation and DL-based methods, which could account for the suboptimal reconstruction accuracy.

DISCUSSION
A deep convolutional neural network named PAE-EDSR was proposed for enhancing sparse PA endomicroscopy images acquired with a newly developed PA endomicroscopy system. The proposed supervised model can recover the line structure of carbon fibre phantoms, and biconcave structure of mouse red blood cells at high fidelity using only 6.25% effective pixels, lead to 16 times increase in the imaging speed.
PAE-EDSR was trained on the exclusive PA endomicroscopy data of carbon fibres and mouse blood cells, respectively. Validation results on the carbon fibre patterns indicated that DL based models can better recover low-frequency features with less blurring and discontinuity compared to classical interpolation method like bicubic. Furthermore, with the spatial attention module, PAE-EDSR was capable of retraining most of high-frequency features associated with the tissue vibrations from the highly sparse data, which was proven challenging for unsupervised DL models 8 . In terms of time efficiency, PAE-EDSR took around 0.8s to reconstruct a 200×200 image when tested on a Tesla t10 with a RAM of 32GB, which was promising for real-time applications. However, PAE-EDSR demonstrated a suboptimal performance on recovering the sparse mouse red blood cells images. This could be explained by the degraded quality of fully sampled images after denoising. In the future, different denoisers can be implemented followed by regular data augmentation methods to further increase the data diversity. Real under-sampled PA endomicroscopy images acquired with different scanning steps will also be incorporated in the training set for improving the robustness of the DL based methods.

CONCLUSION
In this work, a DL based supervised model PAE-EDSR was introduced to enhance under-sampled PA endomicroscopy images for further improving the imaging speed. Spatial attention module was incorporated in the residual block with the help of capturing informative features. PAE-EDSR demonstrated the superiority to classical interpolation method bicubic with better visual quality and a higher reconstruction accuracy. Experimental results indicated that PAE-EDSR can reconstruct the sparse PA endomicroscopy images of carbon fibres and mouse red blood cells with as few as 6.25% effective pixels. Therefore, fewer scanning points are required for acquiring high fidelity images, resulting in the improvement on the frame rate from 1 FPS to around 25 FPS, which could be helpful for the clinical operations of the PA endomicroscopy system.