Kevin Beaulieu

Machine Learning Model Implementation

(2022)

View Project on GitHub

As practice for my degree and future job opportunities, I have been implemented several machine learning models from scratch using only Pandas and NumPy (no scikit-learn, TensorFlow, PyTorch, etc.).

The preprocessing/ directory contains several utilities for loading datasets from CSV files and transforming them to be fed into an ML model:

  • Read dataset from CSV
  • Split dataset into training and testing sets
  • Discretize continuous features with buckets of equal width or equal frequency
  • Encode categorical features with one-hot encoding
  • Perform z-score normalization or min-max scaling on numerical features
  • Impute missing values with feature mean or mode

The utilities/ directory also contains functions for performing k-fold or (k x 2)-fold cross-validation and computing several evaluation metrics.

The models/ directory contains the implementations of the models themselves. I implemented the following models:

  • Decision Tree
  • Random Forest
  • K-Nearest Neighbors
  • Neural Network with Backpropagation and Adam optimizer

My Work

Check out my open-source contributions on GitHub.