King's College London

Research portal

Integrating Provenance Capture and UML with UML2PROV: Principles and Experience

Research output: Contribution to journalArticle

Carlos Sáenz Adán , Beatriz Pérez Valle , Francisco José García Izquierdo , Luc Alice Victor Moreau

Original languageEnglish
Early online date28 Feb 2020
Publication statusE-pub ahead of print - 28 Feb 2020


  • Integrating Provenance Capture and_SAENZ-ADAN_Acc11Feb2020Epub28Feb2020_GREEN AAM

    Integrating_Provenance_Capture_and_SAENZ_ADAN_Acc11Feb2020Epub28Feb2020_GREEN_AAM.pdf, 6.91 MB, application/pdf


    Accepted author manuscript

    © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

King's Authors


In response to the increasing calls for algorithmic accountability, UML2PROV is a novel approach to address the existing gap between application design, where models are described by UML diagrams, and provenance design, where generated provenance is meant to describe an application’s flows of data, processes and responsibility, enabling greater accountability of this application. The originality of UML2PROV is that designers are allowed to follow their preferred software engineering methodology to create the UML Diagrams for their application, while UML2PROV takes the UML diagrams as a starting point to automatically generate: (1) the design of the provenance to be generated (expressed as PROV templates); and (2) the software library for collecting runtime values of interest (encoded as variable-value associations known as bindings), which can be deployed in the application without developer intervention. At runtime, the PROV templates combined with the bindings are used to generate high-quality provenance suitable for subsequent consumption. UML2PROV is rigorously defined by an extensive set of 17 patterns mapping UML diagrams to provenance templates, and is accompanied by a reference implementation based on Model Driven Development techniques. A systematic evaluation of UML2PROV uses quantitative data and qualitative arguments to show the benefits and trade-offs of applying UML2PROV for software engineers seeking to make applications provenance-aware. In particular, as the UML design drives both the design and capture of provenance, we discuss how the levels of detail in UML designs affect aspects such as provenance design generation, application instrumentation, provenance capability maintenance, storage and run-time overhead, and quality of the generated provenance. Some key lessons are learned such as: starting from a non-tailored UML design leads to the capture of more provenance than required to satisfy provenance requirements and therefore, increases the overhead unnecessarily; alternatively, if the UML design is tailored to focus on addressing provenance requirements, only relevant provenance gets to be collected, resulting in lower overheads.

Download statistics

No data available

View graph of relations

© 2018 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454