December 2024

S&P 500 Predictor using Machine Learning

Machine LearningJupyter NotebookStatisticsFinancial ModelingRisk Analysis

Built a Random Forest classifier to predict next-day S&P 500 direction using historical price/volume features and technical indicators, evaluated with precision-focused metrics.

Overview

This project explores next-day direction prediction for the S&P 500 using a supervised machine learning pipeline. I built a feature set from historical market data—price and volume history plus common technical indicators—and trained a Random Forest classifier to predict whether the index would move up or down the following day. The emphasis was on creating a realistic evaluation loop rather than just fitting a model. I used time-aware splits to avoid lookahead bias, compared model performance against simple baselines, and evaluated results with precision-centric metrics to better reflect the cost of false positives in trading-style decisioning. The notebook is structured as an end-to-end workflow: data ingestion and cleaning, feature engineering, model training/tuning, and performance analysis with clear plots and diagnostics.

Highlights

  • End-to-end ML pipeline in Jupyter (data ingestion → feature engineering → training → evaluation)
  • Engineered technical indicator features from historical price/volume data (trend, momentum, and volatility signals)
  • Trained and tuned a Random Forest classifier for next-day direction prediction with time-aware validation
  • Evaluated performance using precision-focused metrics and baseline comparisons to assess practical signal quality