REDDIT, NOVEMBER 5, ELECTION ANALYSIS – Data 73000 / Graduate Student Portfolio

Reddit is one of the most-visited websites globally, with approximately 52 million active daily users. It is a social media platform where users can share content, ask questions, and participate in discussions within niche communities called subreddits. Subreddits are user-created boards focused on specific topics, each with its own rules governing what can be shared. Users can post various types of content, including links, text, images, and videos. In this blog, I set out to analyze whether election sentiments toward Vice President Kamala Harris shifted throughout the day. I used Python for data processing and Tableau for visualization to interpret and display the results.

I extracted data from Reddit using Python through APIs like the Reddit API (accessed via libraries like Python’s Reddit API Wrapper) for November 5, 2024, using the keywords: “Harris,” “Kamala,” “Kamala Harris,” “Vice President,” “Win,” “Lost,” “Victory,” and, “Defeat.” Once the data was collected, it was processed for Natural Language Processing (NLP) using the Natural Language Toolkit (NLTK). For each time range (morning, afternoon, and evening), I analyzed the top 4 subreddits based on the number of posts mentioning these keywords. The data includes post sentiment (positive, neutral, and negative), most frequent words, and metadata such as timestamps and subreddit names. Finally, I decided to create a CSV file to organize the data, making it easier to visualize the results in Tableau.

Multidimensional Scaling

Multidimensional Scaling (MDS) plots are graphs that help us see how things, like words or ideas, are related. They simplify complicated information by turning it into a simple two-dimensional (2D) picture. Python was used to create the MDS plot graph for the most frequent words used on November 05, 2024.

In MDS, a token is a unit of data being analyzed. In this context, a token refers to a word or phrase extracted from subreddits. For example, in analyzing a Reddit post, each word in the post (such as “election,” “Kamala,” or “win”) would be considered a token. Dimensions (MDS Dimension 1 and MDS Dimension 2) represent abstract axes used to position tokens and capture how similar or different they are to each other in the underlying dataset. The darker the word in an MDS plot the more times it was repeated.

The MDS Plot for the plot graph Tokens_November 05, 2004, shows that discussions mainly revolved around Kamala Harris and the election, with words like “win,” “forecast,” and “favorite” reflecting positive and hopeful sentiments about election outcomes. On the other hand, tokens like “problem,” “big,” and “trump” form a cluster of criticism or negative sentiment, possibly indicating dissatisfaction with that candidate. Meanwhile, words like “tim,” “family,” and “casting” appear less connected to the election and instead emphasize other themes. Later, we will explore how these tokens are associated with other themes.

Sentiment Throughout the Day

During the morning, the subreddit r/clevercomebacks garnered the most posts, featuring words such as “big,” “ag,” and “problem,” all of which reflected predominantly neutral to negative sentiment. Similarly, the r/texas subreddit included words like “trump,” “himself,” “kamala,” and “country,” also characterized by neutral to negative sentiment. This indicates a lack of positive discussions during this time period.

In the afternoon, all analyzed posts originated from the r/politics subreddit and reflected a significant shift to positive sentiment. Words such as “kamala,” “harris,” “suddenly,” “favorite,” “win,” and “forecast” suggest favorable discussions, particularly surrounding Kamala Harris and the election’s prospects. The repeated mentions of “win” and “favorite” align with enthusiasm about election outcomes.

By the evening, the discussion tone shifted once again, this time to neutral, with all posts coming from the r/pics subreddit. Words like “tim,” “walz,” “family,” “casting,” “votes,” and “election” suggest content focused more on other themes from the election, such as visuals of voting or non-political elements. The entirely neutral sentiment may indicate an emphasis on reflective content rather than charged political discussions.

here

_{Use the “Time of Day” filter above to view the most commonly used words and sentiments.}

Overall, sentiment varied significantly throughout the day. The morning began with critical or negative discussions, the afternoon transitioned to optimistic conversations, and the evening reflected a neutral, less emotionally charged tone. This pattern demonstrates how public discourse evolved throughout the day, likely influenced by unfolding election events and projections. This nuanced transition underscores the dynamic nature of public sentiment across periods and subreddits.

Limitations

This analysis faced several limitations that impacted the outcome that I wanted to achieve. One major issue was the limited number of subreddits used in the analysis. While expanding the range of subreddits could have provided a broader view of discussions, technical difficulties with the Python code restricted me to only a few subreddits.

My limited experience with Python meant I struggled with specific technical issues, such as debugging code errors and implementing more complex data manipulation techniques. This slowed the progress of the project and limited the scope of what I could achieve. My lack of experience made it harder to adjust the code to align with my goals.

Lastly, given my current skill level, my ambitions for this project may have been too high. I aimed to analyze various factors, such as sentiment trends, token relationships, and subreddit dynamics, but this required advanced coding techniques and a deep understanding of data analysis. While I was able to produce some meaningful insights, the limitations in the dataset, combined with my inexperience, meant that I couldn’t dive as deeply as I had hoped. For future projects, I would consider simplifying my approach or collaborating with someone with more technical expertise to achieve more comprehensive results.