Data: The Happiest States
As I’m approaching my senior year and eventually graduation, I’ve been thinking about where in the country I’d like to settle down to work and live. I’ve floated around – I grew up in suburban Maryland, go to school in Philly, worked in NYC and SF, and traveled to various states over the years. But how much do I really know about how happy citizens are across the country?
I embarked on this project to find out which U.S. states are the happiest (or least happiest) and to find out some factors that contribute to this.
This summary of my results will give you a general overview of happiness across 52 U.S. territories (50 states + D.C. and the U.S. average) as well as social media sentiment surrounding them.
I set out to answer the following questions about satisfaction and mental health in the United States:
Which states are happiest? What about least happiest?
What are the most popular themes/words discussed at both extremes?
How does social media tweet sentiment compare with surveyed mental health data?
Getting the Data
Every year, the Centers for Disease Control and Prevention (CDC) administers the Behavioral Risk Factor Surveillance System (BRFSS), a national telephone survey that asks citizens about their health-related risk behaviors, chronic health conditions, and use of preventive services. It completes over 400,000 adult interviews each year, making it the largest continuously conducted health survey system in the world. I got a clean version of the 2013-2017 Survey Results through the Kaiser Family Foundation (a nonprofit focused on national health issues) and extracted the “% Adults Reporting Poor Mental Health Status” for each state (in a .csv). I used this statistic as a proxy for happiness in each of the states. I also gathered 1000 tweets for each state (totaling ~52,000 tweets) with the Twitter API and TwitteR package in R.
Prepping the Data
I used Excel to prep the mental health data and isolate the statistic representing “percentage of adults reporting poor mental health” for each state. At this stage, I removed Guam, Puerto Rico, and the Virgin Islands but kept D.C. and the national average in. For the Twitter data, I used the tm package to transform the tweets into a corpus, clean, and perform text transformations.
Analysis & Visualizations
I used the syuzhet package in R for sentiment analysis on the 52,000 tweets, ggplot2 package for barplots, and wordcloud package for displaying frequent words.
1) Which states are happiest? What about least happiest?
After taking the mental health data from the Centers for Disease and Control (CDC) and plotting the percentage of adults reporting poor mental health by state, we get the following bar plot:
The 5 happiest states are
South Dakota (SD)
The 5 least happy states are
Now that we know where every state falls on the spectrum, let’s look into what contributes to their mental health environments.
2) What are the most popular themes/words discussed at both extremes?
Focusing on the two states at the extremes – South Dakota for “happiest state” and Oregon for “unhappiest state” – I wanted to dive into some tweets to see what citizens in those states, as well as other people, were discussing. I hoped it would shed some light on why the surveyed citizens feel the way they do.
Using the Twitter API and TwitteR package, I retrieved the 1,000 latest tweets tagged with #southdakota and #oregon and created word clouds for each.
50 Most Frequent Terms Of #southdakota Tweets
South Dakota seems to love their nature! Just what you’d expect from the beautiful Midwest. Maybe your happiness is determined by how much nature you’re around. After all, research has shown that a feeling of connectedness with nature can lead to increased happiness and life satisfaction. If you’re looking for some zen, your best bet is to move somewhere with a lot of nice scenery for an occasional weekend hike. But the nature’s not all – South Dakota is also home to local legislative efforts to improve mental health. Last year, the state received a $8.7M federal grant for mental health training in schools and this summer, mental health is a top issue for lawmakers, who created 5 task forces for their summer study session. Things aren’t perfect yet, but the hard work seems to be paying off.
50 Most Frequent Terms Of #oregon Tweets
Although Oregon also has beautiful natural views, those seem to be overshadowed by more sinister things: political unrest, violence, and tragedy in the region. Another study in 2017 by Mental Health America ranked Oregon the worst in the country due to having some of the worst rates for homelessness, high school graduations, and child abuse. It seems like the state population struggles with addiction and co-occurring mental disorders. In addition, half of high school students in a report by Oregon Student Voice cite waiting for appointments with a counselor at school as negatively affecting mental health. After a bit of additional research into the frequent terms from the word cloud, I discovered that open carry of firearms is legal statewide without a permit (this may explain the “shooting” keyword). For some peace of mind, look for places to settle down with stricter gun laws and well-funded, high quality mental health programs.
3) How does social media tweet sentiment compare with surveyed mental health data?
Performing sentiment analysis on each state’s most recent 1,000 tweets, there seems to be no correlation between sentiment of tweets and surveyed mental health statistics. This could be due to outsiders who are not local citizens tweeting about the state due to some recent event, people not publicizing their real feelings even if they do live there (especially on a platform known for its memes and humor like Twitter), or other unknown factors. I’m not sure how credible tweet sentiments are for explaining mental health across states from this data, so I’d take this with a grain of salt.
Conclusions & Limitations
After analyzing the data and researching the happiest and unhappiest states, we’ve learned that if given a choice in where to move, you should look for states with more nature, gun regulation, and well-funded mental health programs for adults and adolescents.
Some limitations of my analysis include:
The data from the CDC on adults on percentage of adults reporting poor mental health is derived from a self-reported answer on the telephone survey. This is subjective and people may not have an accurate assessment of their own mental condition compared to the conditions of citizens from other states.
For the sake of computational efficiency, I only scraped the latest 1,000 tweets for each state (which oftentimes only went back a day or two). That means if a recent tragedy or natural disaster happened, that would skew the sentiment analysis negatively for that area. Although imperfect, I ended up with a sizable 52,000 tweets to analyze in the end.
To attribute tweets to certain states, I searched by their hashtags (for example, #maryland for Maryland tweets) because I couldn’t search by location from which the tweet was sent. This means a lot of visitors or outsiders could be discussing something regarding the state but not be local citizens, which interfered with the sentiment analysis. This is probably a large reason why the sentiment analysis did not correlate with the surveyed mental health data.
Thanks for reading! Hope you learned a bit about how the states differ in happiness and think about where’d you like to settle down in the future.
Carmen Lau is a junior at The Wharton School invested in advancing mental health issues. This project was done for the class OIDD 245: Analytics in the Digital Economy.
Illustration credit: JONES&CO