CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

Jun Wang, Abhir Bhalerao, Terry Yin, Simon See, Yulan He

Research output: Working paper/PreprintPreprint

74 Downloads (Pure)

Abstract

Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in Radiology Report Generation (RRG) are largely driven by improving models' capabilities in encoding single-modal feature representations, while few studies explore explicitly the cross-modal alignment between image regions and words. Radiologists typically focus first on abnormal image regions before they compose the corresponding text descriptions, thus cross-modal alignment is of great importance to learn an abnormality-aware RRG model. Motivated by this, we propose a Class Activation Map guided Attention Network (CAMANet) which explicitly promotes cross-modal alignment by employing the aggregated class activation maps to supervise the cross-modal attention learning, and simultaneously enriches the discriminative information. Experimental results demonstrate that CAMANet outperforms previous SOTA methods on two commonly used RRG benchmarks.
Original languageEnglish
Publication statusPublished - 2 Nov 2022

Keywords

  • cs.CV

Fingerprint

Dive into the research topics of 'CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation'. Together they form a unique fingerprint.

Cite this