Abstract
With the rise of social media, a vast amount of new primary research material has become available to social scientists, but the sheer volume and variety of this make it difficult to access through the traditional approaches: close reading and nuanced interpretations of manual qualitative coding and analysis. This paper sets out to bridge the gap by developing semi-automated replacements for manual coding through a mixture of crowdsourcing and machine learning, seeded by the development of a careful manual coding scheme from a small sample of data. To show the promise of this approach, we attempt to create a nuanced categorisation of responses on Twitter to several recent high profile deaths by suicide. Through these, we show that it is possible to code automatically across a large dataset to a high degree of accuracy (71%), and discuss the broader possibilities and pitfalls of using Big Data methods for Social Science.
Original language | English |
---|---|
Pages (from-to) | 33-43 |
Number of pages | 11 |
Journal | Online Social Networks and Media |
Volume | 1 |
Early online date | 17 Apr 2017 |
DOIs | |
Publication status | Published - Jun 2017 |
Keywords
- Social media
- Crowd-sourcing
- Crowdflower
- Natural language processing
- Social science
- Emotional distress
- High-profile suicides
- Public empathy