Reddit Sentiment Analysis

An ETL pipeline that collects Reddit posts, applies sentiment analysis, and publishes results on Slack
Screenshot of a part of the code

In this project, I built an ETL (Extract, Transform, Load) pipeline with multiple steps:

  1. Collect posts from Reddit: A script gets Reddits from the Reddit API and inserts them to a MongoDB. (see directory reddit_collector)
  2. Transform Reddit posts: An ETL job extracts data from MongoDB, transforms it including Sentiment Analysis, and loads it into a PostgreSQL database. (see directory etl_job)
  3. Publish selected posts in Slack: In the last step, data on the posts including results of the Sentiment Analysis are loaded and sent as Slack messages. (see directory slack_bot)

The whole pipeline runs using Docker and Docker Compose. For sentiment analysis, I used the SentimentIntensityAnalyzer from vaderSentiment library.