Measuring Americans’ Comfort With Research Uses of Their Social Media Data

How to cite this study

Gilbert, S., Vitak, J. and Shilton, K. 2021. Measuring Americans’ comfort with research uses of their social media data. Social Media + Society 7(3): 1-13.


This study evaluates American perspectives on how social media data is used by researchers. A scenario-based survey with American Facebook users demonstrated that factors such as the type of researcher, content, purpose of data use, and awareness of data collection influence their level of comfort or concern regarding data use. This study provides recommendations for researchers and ethics review boards to ensure responsible data handling.


This study is relevant to those interested in using social media data or other novel data sources like cell phone data for research. Though social media data provides large-scale access to largedatasets, there are ethical concerns that should be considered when assessing this data. Previous studies have found that most Twitter users believe that researchers should ask permission before using their tweets in research (Fiesler and Proferes 2018). The recommendations provided in this paper suggest that researchers should increase user awareness through consent and notifications, identify the main principles in their data collection, and mitigate participants’ concerns through confidentiality or anonymity.


This is a study of American Facebook users’ perceptions of social media data that is used by researchers.

Trail Type

This is a study of American Facebook users’ perceptions of social media data that is used by researchers.


This study aims to identify elements associated with comfort and concern in users’ attitudes about data collection through Facebook. This study was funded by the National Science Foundation.


  • Most respondents were long-term Facebook users, with 81.8% reportedly using Facebook for more than 4 years. 76% were female. The average age was 35. 
  • On a scale of 0=strongly disagree to 100=strongly agree, the average response to “I am concerned that online companies are collecting too much information about me” was 52.35, and “In general, I trust websites” was 45.17. 
  • Respondents answered whether the use of their data in a scenario is “appropriate” or “would concern me” on a sliding scale. Men rated scenarios as significantly less “appropriate” than women; however, no significant difference was found in “concern” ratings between male and female respondents.
  • Those who used Facebook multiple times a day rated data use as more appropriate than all other groups. Data use within Computer Science, Gender Studies, and Psychology was rated as significantly less appropriate than Health Science, which was rated as the most appropriate of all domains. 
  • Using social media data for research purposes was viewed as generally more concerning to users than using data for social media platform improvements. 
  • The factor with the largest impact on comfort was respondents’ awareness of data use. Respondents rated scenarios as less concerning and more appropriate when the researchers gained consent prior to the study than scenarios where the study details were disclosed after the research was complete. Research without notification was viewed with the most discomfort.


Traditional survey questions and factorial vignettes were used. Questions included demographics, levels of Facebook use, and respondent’s attitudes toward privacy and data collection. Factorial vignettes are short scenarios that systematically introduce contextual factors, prompting participants to assess what is acceptable across dozens of scenarios. This method reveals user norms and highlights how these norms can be influenced. Respondents answered whether the use of their data in the scenario is “appropriate” or “would concern me” to indicate their comfort with the scenario on a sliding scale between 0 (strongly agree) and 100 (strongly disagree). 

The relevant literature on privacy in social media platforms was reviewed to develop the scenarios, using variables such as: “role” which includes computer science, journalism, or health science; “data” which includes status updates, photos or videos, comments, sexual habits, and behaviors; “purpose” such as assessing mental health, improving user experience, combating online harassment, advertising, fighting terrorism; and “condition” such as awareness of data collection or no awareness or analyzed by humans or computers. The survey was designed using Qualtrics. Respondents were users who used Facebook more than once a month. Data from 350 respondents was collected between May and June 2019. Linear mixed models were used to analyze the results.

Added to library on November 27, 2023