Topic Taxonomy Construction from ESG Reports

Saif Alnajjar, Xinyu Wang, Yulan He

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

The surge in Environmental, Societal, and Governance (ESG) reports, essential for corporate transparency and modern investments, presents a challenge for investors due to their varying lengths and sheer volume. We present a novel methodology, called MultiTaxoGen, for creating topic taxonomies designed specifically for analysing the ESG reports. Topic taxonomies serve to illustrate topics covered in a corpus of ESG reports while also highlighting the hierarchical relationships between them. Unfortunately, current state-of-the-art approaches for constructing topic taxonomies are designed for more general datasets, resulting in ambiguous topics and the omission of many latent topics presented in ESG-focused corpora. This makes them unsuitable for the specificity required by investors. Our method instead adapts topic modelling techniques by employing them recursively on each topic’s local neighbourhood, the subcorpus of documents assigned to that topic. This iterative approach allows us to identify the children topics and offers a better understanding of topic hierarchies in a fine-grained paradigm. Our findings reveal that our method captures more latent topics in our ESG report corpus than the leading method and provides more coherent topics with comparable relational accuracy.

Original languageEnglish
Title of host publicationJoint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services and the 4th Economics and Natural Language Processing, FinNLP-KDF-ECONLP 2024 at LREC-COLING 2024 - Workshop Proceedings
EditorsChung-Chi Chen, Zhiqiang Ma, Udo Hahn
PublisherEuropean Language Resources Association (ELRA)
Pages178-187
Number of pages10
ISBN (Electronic)9782493814197
Publication statusPublished - 2024
EventJoint Workshop of the 7th Financial Technology and Natural Language Processing, 5th Knowledge Discovery from Unstructured Data in Financial Services and 4th Economics and Natural Language Processing, FinNLP-KDF-ECONLP 2024 - Torino, Italy
Duration: 20 May 2024 → …

Publication series

NameJoint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services and the 4th Economics and Natural Language Processing, FinNLP-KDF-ECONLP 2024 at LREC-COLING 2024 - Workshop Proceedings

Conference

ConferenceJoint Workshop of the 7th Financial Technology and Natural Language Processing, 5th Knowledge Discovery from Unstructured Data in Financial Services and 4th Economics and Natural Language Processing, FinNLP-KDF-ECONLP 2024
Country/TerritoryItaly
CityTorino
Period20/05/2024 → …

Keywords

  • Document Classification
  • Knowledge Discovery/Representation
  • Text Analytics
  • Text categorisation
  • Text Mining
  • Topic Detection
  • Tracking

Fingerprint

Dive into the research topics of 'Topic Taxonomy Construction from ESG Reports'. Together they form a unique fingerprint.

Cite this