A Privacy Assessment of Social Media Aggregators

Social Media Aggregator (SMA) applications present a platform enabling users to manage multiple Social Networking Sites (SNS) in one convenient application, which results in a unique concentration of data from several SNS accounts in addition to the user's mobile phone data available to them. In this paper, we provide a detailed privacy assessment of 13 popular SMAs from 3 app stores by using a three-step methodology by inspecting the mobile data and social media data accessed by these applications, checking for privacy policies and their compliance with distributors' vetting policies and performing a qualitative assessment of traceability between privacy policies and the actual transparency and control mechanisms offered to users by the apps' interfaces. Our results demonstrate a variation in data accessed by the individual applications, an absence of privacy policies for 5 of the SMAs evaluated, and a lack of traceability between privacy policies and transparency and control of interface operations.


Introduction
It is evident that our engagement with Social Networking Sites (SNS) is becoming ever more ingrained in our daily lives. This has been, in part, facilitated by the spectacular growth of mobile social networking, which has a worldwide penetration of 23% (1.7 billion). This proliferation of mobile devices have enabled the users to access social media accounts with more ease and convenience. This is demonstrated by the huge surge in usage of social applications on mobile platform to the extent that an estimated 80% of time spent on social media is using mobile applications.
This shift towards the mobile platform for social media activity has led to the development of Social Media Aggregators (SMAs) which enable users to access all of their social media accounts from a single application. This is partly driven by the fact that users are often found to have accounts on multiple Social Networking Sites (SNSs). It can be quite attractive to users to use SMAs, a single application for all social media accounts, compared to installing separate applications for all their social media sites. An additional attraction of installing a single SMA replacing all social media applications is also related to better utilization of the often limited resources (RAM, CPU power and battery) of the mobile phone itself. Indeed, many SMAs clearly convey this to potential customers as an advantage and a selling poin.t While it is clear that SMAs can be beneficial for users, they also potentially introduce severe privacy risks for users. Users are meant to use SMAs to combine multiple social media accounts and all the activity is routed through a single SMA. This is different from using separate applications for different social media accounts as a user's Facebook application, for example, cannot access their Twitter activity unless an explicit link is made by the user. Such a link between various social media profiles is implicit in the case of SMAs. Moreover, this information about social media activity is augmented with mobile device data such as GPS location, contact lists, camera, etc. Given this potential threat to the privacy of social media users, it is essential to take a closer look at the transparency and control mechanisms offered by these applications. This understanding will help further in-depth analysis of gaps in policy and technology which are required to be overcome in order to safeguard user privacy and enable appropriate usage of SMAs.
In this paper, we employ a three-step methodology to perform, to the best of our knowledge, the first detailed privacy assessment of SMAs. We begin by looking at the Data Permissions requested by SMAs. This includes both mobile data as well as social media data of the user. We then check whether the SMAs have relevant Privacy Policies or other related documentation which explain the collection, usage and purpose of the user data being collected by them. Finally, we qualitatively analyze the privacy policies and perform a Traceability Analysis where we evaluate whether the interface provided to the users are congruent with documented policies to evaluate how transparent data collection is and whether users have a control over the amount and nature of data being collected.
We report the results we obtained for 13 popular SMAs from 3 app stores, showing: a variety in the data accessed, especially when it comes to mobile data; a partial lack of privacy policies (5 out of the 13 SMAs do not have privacy policies); and that a substantial proportion (45%) of SMAs show Broken traceability between policy documentation and interface operation whereas Complete traceability is observed in only 19% of the cases.

Methodology
We begin by listing the various SMAs we have considered in our research along with their sources. We have surveyed 13 popular SMAs for this research. We studied the 6 most popular SMAs (in terms of reviews and installs) each from Google Play Store and iTunes. Additionally, we included a Cydia SMA to account for the variation between SMAs with different levels of adoption as well as between different app stores that have different vetting procedures or policies (e.g., Cydia only works on rooted iOS devices and does not have a vetting process in place). The SMAs are listed with their platform, number of times they have been rated and the number of times they have been downloaded (wherever available) 1 in Table 1. Note that number of reviews was not available for Social Butter and Social hub as there were not enough reviews for iTunes to publish the number. The iOS version on the phone which was used to install these apps was 10.2 while the Android version used was 5.0. It is important to note however, that the findings reported in this paper are independent of the version of OS and versions of individual apps.

Examining Data Permissions
The first step of our analysis requires us to identify exactly which SMAs request permissions to access personal data from the user. All mobile applications are required to request permission for the data they access on the user's phone. We compare the permissions requested by the 13 SMAs included in our analysis. It is important to note here that applications asking for permissions of any data from the user does not mean they are actually accessing it. However, it means that this data is available to them with the consent of the user (demonstrated by granting the access permission while using the application).
Most applications have a "permissions screen" which is shown to the user to communicate the list of mobile data access permissions requested by the application (refer to Fig.  1). However, for the analysis, in addition to the permissions screens, we also looked at the phone settings section for the individual permissions the applications were using. Both Android and iOS display the data access permissions for each application installed on the mobile phone. We also checked the permissions granted to individual SMAs by using "Permissions Manager" application on Android devices. Only permissions which were specified explicitly in either the permission screen or the phone settings (or seen using "Permissions Manager" on Android SMAs) were included in our results.
We examine the social network data (such as profile information, communication, lists, etc.) that are accessed by the SMAs separately. This helps us to understand exactly what information each SMA will try to have access for each of the SNS the user will associate to the SMA. To look at this, we created social media accounts and then authorized the individual SMAs. We then checked the social media 1. These figures were found from the respective app stores and are accurate as of 9th February, 2017.  The permissions can also be checked by the user when the SMA is used to log in to a particular social network account for the first time.
Beyond the data permissions that can be checked using the mobile phone (for mobile data) or the SNS (for social media data), there may be other data (e.g. traffic data) which can be collected. This is examined later in this paper as part of the traceability analysis.

Privacy Policies
The next step in our analysis was to examine the privacy policies of the individual SMAs. In some cases, the relevant document was titled differently (such as "Terms of Service") but we refer to all privacy related documentation as privacy policies for simplicity. The aim of this evaluation was to check for compliance with distributor vetting policies. The 3 app stores included in our research are: 1) Cydia: It does not have an official vetting process for its applications. 2) iTunes Store: It has a vetting process which reviews all applications. 2 Personally identifiable information may not be collected or used without the user's consent. More generally, privacy policies are required if an application stores, shares or uses personal data.
3) Google Play Store: It has a vetting process which looks at app permissions and outlines the application provider agreement to protect the privacy and legal rights of users. 3 If an application accesses registration or personal information, users must be made aware of this, and an adequate privacy policy must be provided in appliance with the law.

Mapping Traceability
Finally, we performed a qualitative analysis of the privacy related documentation to facilitate the traceability analysis with transparency and control interface operations. Previous research has identified a methodology for analysing software requirements from privacy policies [1]. Concepts, categorized as a commitment, privilege or right, are attained from statements by identifying helping verbs, and used to produce a set of software requirements. Similarly, we use content analysis to identify action statements through verbs that we then categorize into privacy implications, which are split into categories by way of answering the following questions: 1) What information is collected by the application? 2) What is the purpose of collection?
3) Who can access this information? 4) How long is information retained? These privacy implications help us in contextualizing the traceability analysis. In particular, we map the extent to which application features and controls match expectations set out to users as data actions in privacy policies or application interfaces. By measuring the traceability of privacy policy implications in application content, we can assess the extent to which data transparency and control are delivered to the user.
For those applications with privacy policies, information provided in these documents present a means of gathering expectations for this analysis. A method for traceability analysis of SNS is presented by Anthonysamy, et al. [2] where action statements identified in privacy policies are mapped to those in interface operations by way of assessing the extent to which data actions are controllable by users. We applied a similar methodology to SMAs and extended it to consider mobile phone data and the transparency of interface operations. In Anthonysamy's methodology, privacy implications found in policies are matched to corresponding operations available through interfaces during installation and use of the application. We have defined actions of privacy policies as privacy implications, and define features and controls of an application as its operations. Also, and extending upon Anthonysamy's methodology, our study aims to identify the traceability of data privacy implications through interface awareness mechanisms. Therefore we assess the transparency of data actions through interface operations, as well as controls.
For SMAs with privacy policies, transparency of data usage is analyzed, mapping information provided in the privacy policy, to that presented through application operations. 3. https://play.google.com/about/developer-distribution-agreement.html Traceability between data actions and the extent to which we control each privacy implication is the second aspect for analysis. In this way we map privacy implications to data transparency and control operations for SMA applications with privacy policies, by carrying out the following steps.
For each privacy implication identified: 1) Identify a corresponding interface operation by matching terminology of data actions. 2) Assess the transparency of data actions made visible to the user through interface operations, contrasting data actions in privacy policies. 3) Assess the extent of user control on data actions through operations, mapping data visible in the previous step (2) with control operations. We measure the extent to which privacy implications are transparent and controllable through user interfaces against three main categories; complete, partial and broken in a similar way as in Anthonysamy, et al. [2], but specifying the categories both for transparency and control: Complete mappings signify complete transparency of information presented to the user, through both transparency and control operations. Information presented to users is unambiguous; with unmistakable meaning and appropriate detail. For transparency, complete traceability can be achieved by providing accurate information to the user through the user interface. An example is when a user is accurately informed about all data being accessed by an app through the permission screen. The control operation is mapped as complete when the user can regulate this list and can choose to withhold certain items of information.
Partial mappings involve ambiguous information provided in privacy documentation or data operations. For example, vague terms like 'personal information', which are not explicitly defined, make mapping data operations difficult. Access permissions are partial data operations because they do not inform users of all data collected. Hootsuite collects location and traffic data, much like most other applications. Although we are prompted for permission regarding location access, the application does not provide any information on the user of traffic data collection. Control over a privacy implication is found to be partial when incomplete, with some control provided but not all data collected have associated controls. Taking Everypost as another example, we find partial control operations are evident for traffic data collected. Everypost's privacy policy 4 states that cookies used by third parties may be opted out of, as is apparent through interface operations. However, collection of traffic data for internal usage such as analytics does not match any control operations.
Broken mappings occur when there is a disconnect between privacy implication expectations and application operations. Control operation mappings are broken when documented expectancies and/or data transparency operations do not have a matching control. Detachment from policy expectations is apparent among privacy implications such as advertising and aggregation. These purposes for data collection are expressed in privacy policies but no corresponding information is provided through application data or control operations. Likewise implications of age restriction in concern to data retention are expressed in policies with disconnect to interface operations. There are many cases in which there is an absence of a clear traceability mapping between privacy implications and interface operations. We have classified these applications as Unknown and represented them in our analysis.
Apart from the above 4 classifications, there are some cases where the privacy implication was not applicable to a particular SMA. In such cases, we have represented this as N/A in our analysis. The detailed results of our analysis is presented in section 4.3.

Data Access Permissions
3.1.1. Mobile Data Access Permissions. As can be seen from the results in Table 2, most applications require access to photos/media, location, identity, which refers to any user accounts on the phone accessed by the application, and network access. In addition, many application require access to the USB storage as well. These findings confirm that personal data of the user is accessed by most of the application that were analyzed. An interesting observation is that permissions seem consistent for the same SMA developers across app stores. However, for different SMAs we observe a wide variety in the mobile data being accessed. While this could be attributed to different functionality being provided, it may also be a sign of some SMAs asking for more permissions than required [3], as arguably one of the most mature and used SMA (Hootsuite, which has 100,000-500,000 installs and 80760 reviews on Google Play, refer Table 1) seems to use a relatively smaller set of permissions when compared to other SMAs. An interesting case is that of Social Media all in one, which seems to access everything except Identity (which can be obtained from SNSs anyway).

Social Media Data Access Permissions.
SMAs are different from other mobile applications as they can access a user's social media data as well. We have summarized the data permissions requested by SMAs while a user logs into their social media accounts in Table 3. We have used general terms such as "Activity" and "Lists" in this table to simply convey the meaning as each social media site uses different names for such features. For example, "posts" on Facebook and "tweets" on Twitter as well as inbox messages are classified under "Activity". Similarly, "Lists" refers to groups or lists that the user might have created (or used by default) to organize their contacts on various social media sites.
We can find in Table 3 that 5 SMAs, namely, iSocial, Social Networking All in One, Social Media all in one, Social Media and Social Media Vault are marked with a ' * ' sign and are shown to access all social media data. This is to highlight the fact that these applications do not disclose what social media data they access to function as they just provide an interface for either the social media apps (such as Facebook, Twitter) already installed on the user's phone or to the web link of the social network via the web browser. As all the social media activity goes via these applications, they have the potential to access all communication. Moreover, these applications do not require to be authorized by the user with their Facebook account so the user cannot regulate the permissions by logging into their Facebook account as is possible with other Facebook applications. For the other SMAs, we find that many of them access almost all social media activity such as posting on walls/tweeting, access the friend or contact lists, update the profile on the users' behalf, post on their behalf, access to inbox messages or the email ID which was used to create the account. Needless to say, all this information may be classified as personal and sensitive to the user and we find that most applications who disclose the permissions access this information.

Application Privacy Policies
Applications that collect personally identifiable information are required to produce a privacy policy in order to comply with the previously discussed distributor vetting policies. Table 4 shows that 8 out of the 13 SMAs that we evaluated were found to include this documentation. The lack of privacy policies among the other 5 SMAs seems to suggest a violation of the distributor vetting policies which mandate such documentation for all applications which process personal data from users. We did find in Table 2 that the SMAs without a privacy policy do not access "Identity", so technically they may argue they do not access personally identifying information. However, they are found to be able to access most of the social media data, photos, location, etc., which can be classified as personal information.

Traceability for Transparency and Control
Common data actions have been categorized to form 14 privacy implications seen in the left column of Table 5. Privacy implications fall under further categories by way of answering our privacy questions set out in section 3.3; collection, purpose, access and retention of data. Operations refer to features provided by SMA providers or distributors which inform us of data collection and use as well as providing us with control over data actions. Each symbol in the table provides a mapping to the degree of traceability offered by transparency and control operations respectively. Data operations refer to the extent to which transparency of data actions is presented to the user through interfaces, these include access permission prompts and other mechanisms which detail privacy implications. Control operations refer to features and mechanisms presented through interfaces which enable control over some data action, these include device settings, accept/decline button options etc. If the same degree is found for both transparency and control operations assessed, then only one symbol need be provided in representation. If a different degree of traceability is found, the first symbol in the particular cell of the table corresponds to transparency operations and the second symbol corresponds to control operations. In the resulting table, we refer to content as the social media data collected shown in Table  3. Other privacy implications and results will be further explained and justified in the following subsections.
3.3.1. Complete. All SMAs provide control over some data collection through access permissions. iSocial does not specify any such method of informing the user of data collected through the requirement to accept access permissions. iSocial's terms and conditions specifies privacy implications; "Any site registration information is used only by the website and is not sold or given out to others", likewise users may provide an email address for the service provider to provide support. Complete transparency for collection can be found when an SMA communicates the data its going to   Sharing information intentionally with SNS involves sharing this with these third parties by users, the transparency of third party access is completely apparent to the user in this case. Some applications offer settings which enable the user a level of control over who accesses information posted to SNS, and the restriction of data access to particular accounts. Controls offered are as found on common SNS; share with only friends or everyone. Asset transfer refers to personally identifiable information being transferred as businesses buy and sell assets.

3.3.2.
Partial. The transparency of privacy implications through access permissions maps only partially to expectations provided by SMA privacy policies. An example of which is partial content collection made visible and controllable to the user. SMAs with privacy policies commonly state their rights to collect all information provided to the site, including shared with associated SNS. Google Play's Hootsuite provides a 'Send usage data' setting; the user is informed anonymous data is collected which is used to help improve Hootsuite. Partial transparency and control over internal use is apparent, with an ambiguous description collection and purpose, along with control over 'anonymous data' but no matching control for all data collected as specified in the privacy policy, such as content posted.

Broken.
Internal use of data includes analytics used to improve or better understand services. It is common for servers to automatically collect usage information; "Server logs may include such information as a mobile device identification number and device identifier, web requests, IP address, browser type, browser language, referring/exit pages and URLs, platform type, number of clicks, domain names, search terms, landing pages ...", the list goes on and on. This type of information collected is referred to as the traffic data privacy implication, and may be shared with third parties on an aggregate basis for advertising and analytic purposes. We can see that both transparency and control for this example are broken in most SMAs, leaving users unaware in their normal use through the interface of the collection of this data and without a way of controlling that in any shape or form.

Unknown.
Analyzed traceability mapping of data use as specified in privacy policies has shown us not to expect applications to inform users about the passive collection of non-identifiable information. We are aware that providers are likely to use and share traffic or aggregate data with third parties, for the purpose of analytics and advertising. We are unable to determine whether an application without a privacy policy passively collects such non-identifiable information. Therefore, for some SMAs, data disclosure to 3rd parties by the provider are shown to be unknown.
3.3.5. Summary. Table 6 summarizes our results, presenting rounded percentages of privacy implications found to be complete, partial, broken, unknown or not applicable. We provide a breakdown for each of the 3 app stores. The overall traceability of transparency and control are also provided.

Discussion
In this section we analyse and discuss the main findings according to the results detailed in the previous section organized into the four areas described below.

Divergence in Data Accessed
SMAs are different from other mobile applications as they represent the unique set of circumstances where a user's mobile data (such as call records, contact list, location, camera, etc.) as well as their social media activity can be collected. This very nature of SMAs make them critical from a privacy perspective. We have seen from our results that most of the SMAs analyzed in this study access a lot of personal data form the mobile phone as well as the social media data (such as activity, lists, etc.) of the user. This combination of mobile phone data and social media data makes SMAs an important threat to user privacy. It is important to acknowledge this threat and attempt to reduce the privacy risks for users of such applications and enable them to enjoy the benefits of concatenating their social media accounts through SMAs.
We largely found that permissions were similar for same developers across different app stores so vetting processes do not affect a great deal. We did also find a wide variety in the amount of permissions required by the different SMAs. This may be because many SMAs are asking for more permissions than required which reinforces well documented concerns [3]. This is also suggested by the fact that one of the most mature SMAs, Hootsuite, requires less data permissions to function than many others.

Partial Lack of Privacy Policies
Analysis of distributor policies found few measures that attempt to preserve user privacy, namely the requirement for providers to fashion a privacy policy and gain consent from users when collecting personal information. Over three popular application stores, only eight free SMA applications could be identified which present a privacy policy. Results do not indicate that providers are breaking these rules, but rather that mobile applications commonly circumvent the need to inform users of data actions performed on what is perceived as non-personal information. We did find in Table 2 that the three SMAs without a privacy policy do not access "Identity", so technically they do not access personal information. However, they are found to be able to access most of the social media data, photos, location, etc., which can be classified as personal information.
A possible reason for the failure to provide privacy policies may be the effort and expertise required to produce such documents. A possible mitigation can be found in automated solutions like "AutoPPG" which is an automatic privacy policy generator for Android applications [4]. It simply identifies the important privacy issues emanating from the usage of the application by conducting a static analysis of the application's source code. Automated solutions such as these may encourage SMA and other application developers to include privacy policies without putting in much effort.

Lack of Transparency and Control
We find a general lack of transparency across SMAs with 45 percent of SMAs revealing broken transparency mappings. Privacy implications offering complete transparency of data involve collection of personal information made visible to the user through in some way (e.g. showing the access permissions required). In order to consider current guidelines for user privacy as adequate, we must rule out mistrust between the user's expectations and reality of how SMAs treat their information by making them aware, either through privacy policies or through other awareness mechanisms, of any data collected, how it will be used, whom it will be shared with, and how long it will be retained.
We also find that users have a lack of control as less than a quarter of the results indicated complete control over privacy implications. In order to give more control to users, developers could work to increase application functionality while restricting access to data. Settings should enable control over all data collected, including information perceived as non-identifiable. Research has shown that pragmatic approaches of providing privacy related intervention, where users are shown the effect of exposures of their data, work well [5].

Analysis of Mobile Data Access Permissions
Mobile applications generally are explicit in disclosing the data access permissions they require to the users. There is generally a screen which is shown to the user at the time of installation which tells them the data that the particular application will be allowed to access. The major issue is the "all or nothing" nature of mobile applications [6], [7]. The user is required to grant the requested permissions to the application for them to use it. This is a problem as it has been shown that mobile applications often introduce risk vectors by asking for more permissions than required [3], [8]. The problem is that the applications are somewhat hamstrung in this regard and have to request for permissions that they envisage using at any time during execution. There have been some solutions put forth to detect and possibly prevent malicious mobile applications by using anomaly detection to detect applications behaving maliciously and in a deviant manner from normally expected behavior [9]. The idea is to use static analysis to create profiles of applications' expected behavior and detect anomalies at runtime to secure mobile applications. This is similar to the work of Hussain et al. which looks at detecting malicious database applications [10]. Another proposed approach, "PrivacyGuard" uses the VPN service of Android devices to intercept network traffic of mobile applications to detect information leakage [11]. It also provides mechanisms of tricking the malicious applications by manipulating the leaked information. Awareness mechanisms such as privacy "nudges" have also been found to be reasonably successful as a deterrent for some users [12]. Recommending mobile apps to users by providing information about the security and privacy aspects has also been suggested [13]. However, we found that most of the previous work in this area only looks at leakage of mobile data and not social media data which SMAs have access to as well.

Analysis of Privacy Policy Traceability
There is previous work which shows that control over data disclosure can affect decisions made by users [14]. Greater transparency about data being shared often acts as a mitigating factor against erroneous decisions being made. Privacy policies are often employed to inform the user about the information that is being collected and accessed and are hence an instrument of transparency form the users' perspective. However, the readability of these privacy policies (or related documentation) has often been found to be inadequate [15], [16]. Moreover, studies have also demonstrated the lack of usability and correctness in privacy controls in SNSs which make it extremely difficult for the average user to configure them appropriately [17], [18]. Given how separate analysis of transparency and control has found significant problems, our current work looks at the traceability for transparency and control by looking at the interface operations and how closely they match with privacy policies. Qualitative analysis of documented policies and analyzing traceability with interface features has been explored earlier in software engineering by looking at complaince of documented software requirements with legal texts or described privacy policies [1], [19], [20]. More recently, such analysis has been used to analyze whether the privacy policies outlined by SNSs are congruent with the interface controls provided to the users. Anthonysamy et al. demonstrated that SNSs themselves suffer from a lack of traceability between data actions defined in privacy policies and corresponding data operations apparent to users through interfaces [2], [21]. Our work extends this methodology to perform a privacy analysis for SMAs by performing an analysis of the mobile phone data and social media data accessed by the SMAs in addition to a traceability mapping which considers the transparency of interface operations and the control provided to the user.

Conclusions
In this paper, we employed a three-step methodology to provide the first of its kind privacy assessment of SMAs by examining the data (both mobile and social media) permissions requested by them, checking whether they provide the user with privacy related documentation and analyzing traceability between privacy implications identified in the privacy policy with the interface operations provided to the user. We evaluated 13 popular Social Media Aggregators (SMAs) from 3 app stores and found that the majority of the SMAs we evaluated accessed users' personal information including their social media activity. However, we also found that 5 of the 13 SMAs did not provide any privacy related documentation which is in clear conflict with the vetting policies of the app stores. Our results show that 45% of SMAs show Broken traceability between privacy documentation and interface operations while Complete traceability is observed in only 19% of the cases. These results highlight the need for major improvements to ensure that the usage of SMAs does not compromise user privacy.
Future research in this area would benefit by considering main motivators of data privacy mechanisms, by way of seeking initiatives for providers to improve traceability of data implications. Efforts to reuse the methodology used in this paper may find it beneficial, but also challenging, to automate the traceability analysis which is the costliest part in terms of time and effort. Research could also seek to identify improved regulations for data privacy, with particular concern to non-identifiable personal information. Finally, SMAs are different from other mobile applications as they have an inherent link to the social media activity of users. In this paper, our main focus was on highlighting the absence of traceability between privacy policies and interface operations. However, research on online SNSs show that users struggle with the social aspects of privacy on these platforms due to the complex nature of their networks and interpersonal relationships [22]. Similar analysis of privacy mechanisms, particularly from a social standpoint, for SMAs may be of particular interest to consider both so-called institutional and social privacy [23].