The Multiscale Surface Vision Transformer

Simon Dahan, Logan Williams, Daniel Rueckert, Emma Robinson*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

1 Citation (Scopus)

Abstract

Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domainagnostic architectures for sequence-to-sequence learning, the quadratic cost of the selfattention operation remains an obstacle for many dense prediction tasks. Inspired by
some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture
for surface deep learning. The self-attention mechanism is applied within local-meshwindows to allow for high-resolution sampling of the underlying data, while a shiftedwindow strategy improves the sharing of information between windows. Neighbouring
patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks
using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and
manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers
Original languageEnglish
Title of host publicationProceedings of Machine Learning Research
Pages1-17
Volume106
Publication statusPublished - 2024

Fingerprint

Dive into the research topics of 'The Multiscale Surface Vision Transformer'. Together they form a unique fingerprint.

Cite this