Unsupervised Machine Learning Projects
This repository showcases projects I have completed that utilize various unsupervised machine learning clustering algorithms. These projects highlight my ability to apply clustering techniques and evaluate their effectiveness using metrics like silhouette scores.
Clustering Algorithms Explored
- K-Means
- K-Medoids
- Hierarchical Clustering
- Density-Based Spatial Clustering (DBSCAN)
- Gaussian Mixture Model (GMM)
Skills Demonstrated
- Data Preprocessing: Handling and cleaning large datasets.
- Exploratory Data Analysis (EDA): Gaining insights from raw data and preparing it for clustering.
- Clustering Evaluation: Using metrics such as silhouette scores to assess the performance of different algorithms.
- Outlier Handling: Identifying and mitigating the impact of outliers on clustering results.
- Principal Component Analysis: Condensing data down so it can be visualized as either 2D or 3D data
- T-Stochastic Neighbor Embedding:
Projects Overview
1. Auto MPG
The first project in this repository is a project from MIT’s Data Science and Machine Learning Program. The project utilized T-SNE and PCA in order to reduce dimensionality of the data and extract insights.
Details of this project can be found in the Auto_MPG_Project folder.
2. Market Segmentation with K-Medoids (MIT Capstone)
The second project in this repository is my capstone project from MIT’s Data Science and Machine Learning Program. After evaluating multiple clustering algorithms, I selected K-Medoids as the most appropriate for the dataset due to its:
- High silhouette score, indicating compact and well-separated clusters.
- Robustness against outliers, which was crucial for this dataset.
The project includes:
- Raw data
- Preprocessed and analyzed data
- Code for clustering and evaluation
- Final presentation delivered on December 13, 2023
Details of this project can be found in the Market_Segmentation_With_K-Medoids folder.
Future Additions
I plan to expand this repository with more clustering projects, exploring real-world datasets and advanced evaluation techniques.
Notes
- This repository highlights my technical skills in unsupervised learning and the ability to communicate results effectively to both technical and non-technical audiences.
- Each subfolder contains a
README.md with specific details about the corresponding project.
How to Use This Repository
- Navigate to the folders listed above for each project.
- Explore the notebooks, datasets, and visualizations within each subfolder.