Next-generation visitation models using social media to estimate recreation on public lands

How to cite this study

Wood, S.A., Winder, S.G., Lia, E.H., White, E.M., L. Crowley, C.S. and Milnor, A.A. 2020. Next-generation visitation models using social media to estimate recreation on public lands. Scientific Reports, 10, 15419.

Overview

This study evaluates how social media predicts visitation across multiple sites. Two geographically different areas were chosen to evaluate how well visitation models can be generalized to different areas. Adding social media data to a model was found to improve visitor estimates at unmonitored sites, even when a model is parameterized with data from another region.

Relevance

This study is relevant to those interested in how models developed for one geographic region can be used to predict visitation in other regions. It is also relevant to those interested in using social media data to improve predictions for visitation patterns.

However, there are caveats when applying the results from social media-based visitation models. For example, social media may not precisely predict visitation levels, as the choice of whether or not to post can be influenced by the characteristics of the recreation site such as the topography, unique natural features like a scenic overlook, or how people relate to these features.

Location

This study examines recreation areas in the Jemez Mountains in rural northern New Mexico and in the Mt. Baker-Snoqualmie National Forest in western Washington. These sites were chosen because of the variety in geography and ecology, which likely shapes patterns of visitation.

Trail Type

Trail use at both study locations was year-round, with more in and out visitation in Washington due to geography and terrain and more dispersed travel across trails in New Mexico.

Purpose

The purpose of this study was to observe how visitation models parameterized for Mt. Baker-Snoqualmie National Forest performed with data from rural New Mexico, and determine whether adding social media data improved estimates. Funding was provided by the U.S. Department of Interior Office of Policy Analysis, Mt. Baker-Snoqualmie National Forest, and the USDA Forest Service Pacific Northwest Research Station.

Findings

  • Relatively strong correlations were observed between weekly visitation and Instagram user-days in both regions. The relationships between observed visitation and Twitter user-days and Flickr user-days were weaker, but still positively correlated.
  • In testing the model’s predictive power on data from New Mexico, which was not used to train the model, the simplest model that relies on only generic predictors like weather, holidays, and seasonality in Western Washington performed poorly when applied to New Mexico and captured only 6% of the variability in visitation. 
  • Model 2, which added social media predictors from Washington, captured 45% of the variability in visitation across all 13 study sites in New Mexico. It captured 79% of the variability in visitation at the seven New Mexico sites that had social media posts.
  • Visitation models are most effective when some count data is gathered on-site. Models that include limited data from other sites in the area explain 91% of the variability in actual use at sites in New Mexico when a random subset of one-third of on-site count New Mexico data was included in building the model, compared to up to 79% when only social media, precipitation, and calendar data were concluded. 
  • The authors note that visitation models that include predictors from social media, will likely be more effective than traditional models for estimating use at popular sites or when longer time series data is available. However, at locations with either low or no social media activity, it may be more useful to collect on-site data such as vehicle and pedestrian counts.

Methods

At 42 distinct sites (13 in New Mexico and 29 in Washington), on-site visitation data was collected in national forests, national preserves, or national monuments. Visitation was monitored between August 2016 and November 2018 in Washington and between April and October 2018 at most sites in New Mexico. At each site, a passive infrared pedestrian counter (38 sites) or a time-lapse video camera (4 sites) was installed at the first narrow point on a trail and hidden to capture normal visitor behavior. The counts were divided by two to account for visitors passing the trail counter twice on their round trip. 

Social media data was collected from Flickr, Instagram, and Twitter. The number of user-days (unique social media users who posted each day) was calculated based on the data and location where the content was created. Data included all publicly available posts that were tagged to a location in the site boundary. The visitation estimates from the on-site counters were modeled as a function of social media and several additional variables controlling for weather and social factors that influence visitation. All data was aggregated to a weekly scale. Five linear models were built to test how incorporating social media affects visitation predictions. Each was parameterized using different combinations of the data to represent differing scenarios of data availability.


Added to library on November 27, 2023