TY - CHAP
T1 - Language Models Meet Anomaly Detection for Better Interpretability and Generalizability
AU - Li, Jun
AU - Kim, Su Hwan
AU - Müller, Philip
AU - Felsner, Lina
AU - Rueckert, Daniel
AU - Wiestler, Benedikt
AU - Schnabel, Julia A.
AU - Bercea, Cosmin I.
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model’s generalizability to previously unseen medical conditions. The code and dataset are available at: https://github.com/compai-lab/miccai-2024-junli?tab=readme-ov-file.
AB - This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model’s generalizability to previously unseen medical conditions. The code and dataset are available at: https://github.com/compai-lab/miccai-2024-junli?tab=readme-ov-file.
KW - Multimodal Learning
KW - Vision-Language Models
KW - VQA
UR - http://www.scopus.com/inward/record.url?scp=105003862554&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-84525-3_10
DO - 10.1007/978-3-031-84525-3_10
M3 - Conference paper
AN - SCOPUS:105003862554
SN - 9783031845246
T3 - Lecture Notes in Computer Science
SP - 113
EP - 123
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 Workshops - LDTM 2024, MMMI/ML4MHD 2024, ML-CDS 2024, Held in Conjunction with MICCAI 2024, Proceedings
A2 - Schroder, Anna
A2 - Li, Xiang
A2 - Syeda-Mahmood, Tanveer
A2 - Oxtoby, Neil P.
A2 - Young, Alexandra
A2 - Hering, Alessa
A2 - Mathai, Tejas S.
A2 - Mukherjee, Pritam
A2 - Kuckertz, Sven
A2 - He, Tiantian
A2 - Llorente-Saguer, Isaac
A2 - Maier, Andreas
A2 - Kashyap, Satyananda
A2 - Greenspan, Hayit
A2 - Madabhushi, Anant
PB - Springer Science and Business Media Deutschland GmbH
T2 - Workshop on Longitudinal Disease Tracking and Modeling with Medical Images and Data, LDTM 2024, 5th International Workshop on Multiscale Multimodal Medical Imaging, MMMI 2024, 1st Workshop on Machine Learning for Multimodal/-sensor Healthcare Data, ML4MHD2024 and Workshop on Multimodal Learning and Fusion Across Scales for Clinical Decision Support, ML-CDS 2024 held in conjunction with the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024
Y2 - 6 October 2024 through 10 October 2024
ER -