Machine Learning Model Implementation

(2022)

As practice for my degree and future job opportunities, I have been implemented several machine learning models from scratch using only Pandas and NumPy (no scikit-learn, TensorFlow, PyTorch, etc.).

The preprocessing/ directory contains several utilities for loading datasets from CSV files and transforming them to be fed into an ML model:

Read dataset from CSV
Split dataset into training and testing sets
Discretize continuous features with buckets of equal width or equal frequency
Encode categorical features with one-hot encoding
Perform z-score normalization or min-max scaling on numerical features
Impute missing values with feature mean or mode

The utilities/ directory also contains functions for performing k-fold or (k x 2)-fold cross-validation and computing several evaluation metrics.

The models/ directory contains the implementations of the models themselves. I implemented the following models:

Decision Tree
Random Forest
K-Nearest Neighbors
Neural Network with Backpropagation and Adam optimizer

Machine Learning Model Implementation

My Work

Jobs

Education

Independent Projects

Older Works