TY - JOUR
T1 - Provenance Network Analytics
T2 - An approach to data analytics using data provenance
AU - Huynh, Trung Dong
AU - Ebden, Mark
AU - Fischer, Joel
AU - Roberts, Stephen
AU - Moreau, Luc
PY - 2018/5/30
Y1 - 2018/5/30
N2 - Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
AB - Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
KW - data provenance
KW - data analytics
KW - network metrics
KW - graph classification
UR - http://www.scopus.com/inward/record.url?scp=85042114156&partnerID=8YFLogxK
U2 - 10.1007/s10618-017-0549-3
DO - 10.1007/s10618-017-0549-3
M3 - Article
SN - 1384-5810
VL - 32
SP - 708
EP - 735
JO - DATA MINING AND KNOWLEDGE DISCOVERY
JF - DATA MINING AND KNOWLEDGE DISCOVERY
ER -