Assessing and Measuring the Privacy Practices of Voice Assistant Applications

Student thesis: Doctoral ThesisDoctor of Philosophy


Smart Personal Voice Assistants (SPA) are fast becoming popular with the widespread introduction of desktop, phone and home assistants. Over a hundred million users now utilise SPA like Alexa, Siri, Google Assistant, Bixby and Cortana every day, and SPA devices have been sold in massive numbers. However, recent security and privacy incidents involving SPA like Alexa recording a private conversation and sending it to a random contact have increased users’ concerns about the security and privacy of these assistants. This thesis studies the security and privacy issues of SPA. In particular, the risks associated with the skills (voice applications) they leverage to extend and expand their functionality. Firstly, we present a classification of SPA security and privacy issues and use it to systematically map current attacks and countermeasures to different architectural elements. We show that those elements expose SPA to various risks, such as the complexity of their architecture, the AI features, the wide range of underlying technologies, and the open nature of the voice channel they use.

We then conduct a systematic study of SPA third-party skills as this is one of the architectural elements offering a large attack surface. In particular, we study the permission model SPA providers offer to developers and investigate how third-party skills use them to collect personal data. We further design a methodology that systematically identifies potential privacy issues in the third-party skills by analysing the traceability between the permissions and the data practices stated by developers. In addition, we propose a highly accurate system to automate the traceability analysis at scale. Furthermore, we perform a longitudinal measurement study of the Amazon Alexa skills across the marketplaces for three years to demystify developers’ data practices and present an overview of the third-party skill ecosystem. Finally, we present an open tool that allows proactive audit of data collection practices in emerging technologies like SPA. The overall study resulted in two new datasets for smart assistants privacy assessment evaluation: the traceability-by-policy dataset (TBPD) and the permission-by-sentence dataset (PBSD). All these aim to contribute to the collective effort towards establishing secure, privacy-aware assistants.
Date of Award1 Oct 2022
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorJose Such (Supervisor) & Guillermo Suarez de Tangil Rotaeche (Supervisor)

Cite this