Projects

Here are some of my personal data science projects!

"Show me what you can do; don't tell me what you can do."

-John Wooden

DengAI - Predicting Disease Spread

Built a time-series framework to forecast dengue cases with multiple regressors via Seasonal-ARIMA and FB Prophet.

Engineered new features and applied dimensionality reduction techniques like PCA along with feature selection methods like Select K-best by chi-squared and f regression function to find the best features for time series forecast.

Home Credit Default Risk (HCDR) Prediction

Combined 7 datasets with 20+ million rows and 200+ columns, performed extensive feature engineering and exploratory data analysis to prepare data for modelling.

Implemented full Machine Learning pipeline to build a classification model to predict the Credit Default risk using ensemble learning methods.

Screen Shot 2022-11-07 at 4.01.39 PM.png

Twitter Football Data Pipeline and Analysis

Designed and implemented a real-time data pipeline to process unstructured data from Twitter on AWS platform.

Used EC2 for scheduling, Glue for ETL, Sagemaker to model and Quicksight to visualize the real time analysis.

Exploratory Data Analysis - Gapminder (R)

Using the gapminder data set, I investigated the link betweenvper capita GDP and life expectancy between 1952 and 2007. I discovered that both wealth and time have significantvapparent effects on life expectancy, while the details differ significantly between continents. All the reasoning are supported by statistical methods and plots.

Stroke Prediction

Working on this well-known dataset by WHO, I have implemented several visualizations using Plotly, seaborn and matplotlib. I have concluded this notebook by finally predicting the chances of stroke in an individual based on 11 features using XGBoost, ADABoost and Pytorch.

Movie Rating Collection
Webscraping | BeautifulSoup

Created a beginner's guide to webscraping by extracting data from IMDB webpages. I used BeautifulSoup to the information and collate the ratings of the movies so that they can be used for further examination.

Wine Quality Prediction

Successfully used KNN algorithm to predict the quality of wine from the famous UCI - ML wine dataset. Found which variable correlates the most in predicting wine quality.

delicious-red-wine-royalty-free-image-15