Hatevolution: What Static Benchmarks Don’t Tell Us

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

74 Downloads (Pure)

Abstract

Language changes over time, including in the hate speech domain, which evolves quickly following social dynamics and cultural shifts. While NLP research has investigated the impact of language evolution on model training and has proposed several solutions for it, its impact on model benchmarking remains underexplored. Yet, hate speech benchmarks play a crucial role to ensure model safety. In this paper, we empirically evaluate the robustness of 20 language models across two evolving hate speech experiments, and we show the temporal
misalignment between static and time-sensitive evaluations. Our findings call for time-sensitive linguistic benchmarks in order to correctly and reliably evaluate language models in the hate speech domain.
Original languageEnglish
Title of host publicationProceedings of the 2025 Annual Meeting of the Association for Computational Linguistics (ACL), Findings
Publication statusPublished - Jul 2025

Keywords

  • Large Language Models (LLMs)
  • Hate Speech Detection
  • Benchmark
  • Evaluation
  • temporal
  • language change

Fingerprint

Dive into the research topics of 'Hatevolution: What Static Benchmarks Don’t Tell Us'. Together they form a unique fingerprint.

Cite this