Language Models Meet Anomaly Detection for Better Interpretability and Generalizability

Jun Li, Su Hwan Kim, Philip Müller, Lina Felsner, Daniel Rueckert, Benedikt Wiestler, Julia A. Schnabel*, Cosmin I. Bercea*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

This research explores the integration of language models and unsupervised anomaly detection in medical imaging, addressing two key questions: (1) Can language models enhance the interpretability of anomaly detection maps? and (2) Can anomaly maps improve the generalizability of language models in open-set anomaly detection tasks? To investigate these questions, we introduce a new dataset for multi-image visual question-answering on brain magnetic resonance images encompassing multiple conditions. We propose KQ-Former (Knowledge Querying Transformer), which is designed to optimally align visual and textual information in limited-sample contexts. Our model achieves a 60.81% accuracy on closed questions, covering disease classification and severity across 15 different classes. For open questions, KQ-Former demonstrates a 70% improvement over the baseline with a BLEU-4 score of 0.41, and achieves the highest entailment ratios (up to 71.9%) and lowest contradiction ratios (down to 10.0%) among various natural language inference models. Furthermore, integrating anomaly maps results in an 18% accuracy increase in detecting open-set anomalies, thereby enhancing the language model’s generalizability to previously unseen medical conditions. The code and dataset are available at: https://github.com/compai-lab/miccai-2024-junli?tab=readme-ov-file.

Original languageEnglish
Title of host publicationMedical Image Computing and Computer Assisted Intervention – MICCAI 2024 Workshops - LDTM 2024, MMMI/ML4MHD 2024, ML-CDS 2024, Held in Conjunction with MICCAI 2024, Proceedings
EditorsAnna Schroder, Xiang Li, Tanveer Syeda-Mahmood, Neil P. Oxtoby, Alexandra Young, Alessa Hering, Tejas S. Mathai, Pritam Mukherjee, Sven Kuckertz, Tiantian He, Isaac Llorente-Saguer, Andreas Maier, Satyananda Kashyap, Hayit Greenspan, Anant Madabhushi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages113-123
Number of pages11
ISBN (Print)9783031845246
DOIs
Publication statusPublished - 2025
EventWorkshop on Longitudinal Disease Tracking and Modeling with Medical Images and Data, LDTM 2024, 5th International Workshop on Multiscale Multimodal Medical Imaging, MMMI 2024, 1st Workshop on Machine Learning for Multimodal/-sensor Healthcare Data, ML4MHD2024 and Workshop on Multimodal Learning and Fusion Across Scales for Clinical Decision Support, ML-CDS 2024 held in conjunction with the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024 - Marrakesh, Morocco
Duration: 6 Oct 202410 Oct 2024

Publication series

NameLecture Notes in Computer Science
Volume15401 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceWorkshop on Longitudinal Disease Tracking and Modeling with Medical Images and Data, LDTM 2024, 5th International Workshop on Multiscale Multimodal Medical Imaging, MMMI 2024, 1st Workshop on Machine Learning for Multimodal/-sensor Healthcare Data, ML4MHD2024 and Workshop on Multimodal Learning and Fusion Across Scales for Clinical Decision Support, ML-CDS 2024 held in conjunction with the 27th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024
Country/TerritoryMorocco
CityMarrakesh
Period6/10/202410/10/2024

Keywords

  • Multimodal Learning
  • Vision-Language Models
  • VQA

Fingerprint

Dive into the research topics of 'Language Models Meet Anomaly Detection for Better Interpretability and Generalizability'. Together they form a unique fingerprint.

Cite this