Projects
Here are some of my personal data science projects!
"Show me what you can do; don't tell me what you can do."
-John Wooden
Built a time-series framework to forecast dengue cases with multiple regressors via Seasonal-ARIMA and FB Prophet.
Engineered new features and applied dimensionality reduction techniques like PCA along with feature selection methods like Select K-best by chi-squared and f regression function to find the best features for time series forecast.
Combined 7 datasets with 20+ million rows and 200+ columns, performed extensive feature engineering and exploratory data analysis to prepare data for modelling.
Implemented full Machine Learning pipeline to build a classification model to predict the Credit Default risk using ensemble learning methods.
Designed and implemented a real-time data pipeline to process unstructured data from Twitter on AWS platform.
Used EC2 for scheduling, Glue for ETL, Sagemaker to model and Quicksight to visualize the real time analysis.
Using the gapminder data set, I investigated the link betweenvper capita GDP and life expectancy between 1952 and 2007. I discovered that both wealth and time have significantvapparent effects on life expectancy, while the details differ significantly between continents. All the reasoning are supported by statistical methods and plots.
Working on this well-known dataset by WHO, I have implemented several visualizations using Plotly, seaborn and matplotlib. I have concluded this notebook by finally predicting the chances of stroke in an individual based on 11 features using XGBoost, ADABoost and Pytorch.
Created a beginner's guide to webscraping by extracting data from IMDB webpages. I used BeautifulSoup to the information and collate the ratings of the movies so that they can be used for further examination.
Successfully used KNN algorithm to predict the quality of wine from the famous UCI - ML wine dataset. Found which variable correlates the most in predicting wine quality.