TY - JOUR
T1 - A comparison of dataset search behaviour of internal versus search engine referred sessions
AU - Ibáñez, Luis-Daniel
AU - Simperl, Elena
N1 - Funding Information:
Acknowledgements. Research was supported by the data.europa.eu portal, an initiative funded by the European Union. This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) grant Data Stories (EP/P025676/2).
Publisher Copyright:
© 2022 ACM.
PY - 2022/3/14
Y1 - 2022/3/14
N2 - Dataset discovery is a first step for data-centric tasks, from data storytelling to labelling for supervised machine learning. Previous qualitative research suggests that people use two types of search affordances to find the data they need: they either go to a data portal that probably contains the data and search there; or they start on a regular web search engine, which sometimes returns results that are datasets. For the first type of search, prior works have analysed logs from different data portals to understand basic tenets of search behaviour such as query length or topics. In this paper, we advance the state of the art in dataset search behaviour with a comprehensive transaction log analysis study (n = 236441 sessions) of an international open data portal, in which we compare sessions straight on a data portal (internal searches) against sessions that land on a dataset or SERP (search engine result page) through a referral from a web search engine (external). Using dataset downloads as a proxy for successful searches, we find a statistically significant, though weak relationship between the use of keyword search and session type and between the use of search facets and session type (moderate). We also discover and discuss behavioural patterns and user profiles across session types.
AB - Dataset discovery is a first step for data-centric tasks, from data storytelling to labelling for supervised machine learning. Previous qualitative research suggests that people use two types of search affordances to find the data they need: they either go to a data portal that probably contains the data and search there; or they start on a regular web search engine, which sometimes returns results that are datasets. For the first type of search, prior works have analysed logs from different data portals to understand basic tenets of search behaviour such as query length or topics. In this paper, we advance the state of the art in dataset search behaviour with a comprehensive transaction log analysis study (n = 236441 sessions) of an international open data portal, in which we compare sessions straight on a data portal (internal searches) against sessions that land on a dataset or SERP (search engine result page) through a referral from a web search engine (external). Using dataset downloads as a proxy for successful searches, we find a statistically significant, though weak relationship between the use of keyword search and session type and between the use of search facets and session type (moderate). We also discover and discuss behavioural patterns and user profiles across session types.
UR - http://www.scopus.com/inward/record.url?scp=85127387149&partnerID=8YFLogxK
U2 - 10.1145/3498366.3505821
DO - 10.1145/3498366.3505821
M3 - Article
SP - 158
EP - 168
JO - ACM SIGIR Conference on Human Information Interaction and Retrieval
JF - ACM SIGIR Conference on Human Information Interaction and Retrieval
ER -