Abstract
Fine-grained text classification with similar and many labels is a challenge in practical applications. Interpreting predictions in this context is particularly difficult. To address this, we propose a simple framework that disentangles feature importance into more fine-grained links. We demonstrate our framework on the task of intent recognition, which is widely used in real-life applications where trustworthiness is important, for state-of-the-art Transformer language models using their attention mechanism. Our human and semi-automated evaluations show that our approach better explains fine-grained input-label relations than popular feature importance estimation methods LIME and Integrated Gradient and that our approach allows faithful interpretations through simple rules, especially when model confidence is high.
Original language | English |
---|---|
Title of host publication | ECAI 2023 |
Subtitle of host publication | 3rd International Workshop on Explainable and Interpretable Machine Learning (XI-ML) |
DOIs | |
Publication status | Published - 2023 |