An ETL pipeline that collects Reddit posts, applies sentiment analysis, and publishes results on Slack
In this project, I built an ETL (Extract, Transform, Load) pipeline with multiple steps:
- Collect posts from Reddit: A script gets Reddits from the Reddit API and inserts them to a MongoDB. (see directory reddit_collector)
- Transform Reddit posts: An ETL job extracts data from MongoDB, transforms it including Sentiment Analysis, and loads it into a PostgreSQL database. (see directory etl_job)
- Publish selected posts in Slack: In the last step, data on the posts including results of the Sentiment Analysis are loaded and sent as Slack messages. (see directory slack_bot)
The whole pipeline runs using Docker and Docker Compose. For sentiment analysis, I used the SentimentIntensityAnalyzer from vaderSentiment library.