Projects

Selected Projects

A collection of technical projects in machine learning, quantitative research, data engineering, and software development, highlighting problem-solving approaches, methodologies, and measurable outcomes.

Rail Break Prediction Project
University Industry Placement+

Predicting rail breaks is a rare-event machine learning problem. This project focused on handling class imbalance, selecting meaningful sensor features, building Databricks-based ML pipelines, and evaluating performance with metrics suited to imbalanced data such as F1 and PR-AUC.

DatabricksPySparkSQLMachine LearningXGBoostRandom ForestFeature EngineeringFeature SelectionImbalanced LearningEDAPR-AUCF1 Score
Problem Statement
+

The goal was to predict whether a rail section may break within the next 30 days using operational and sensor data. The dataset was highly imbalanced, meaning standard classifiers could easily bias toward the majority non-break class.

Methodology
+
  • Performed EDA on class distribution, location-level imbalance, and temporal break trends.
  • Researched feature selection methods including correlation filtering, variance thresholding, ANOVA F-test, Lasso logistic regression, and Random Forest feature importance.
  • Built feature engineering and model training workflows using Databricks notebooks, PySpark, SQL, and project pipelines.
  • Evaluated models using accuracy, F1 score, PR-AUC, and practical model interpretation for stakeholder reporting.
Results
+
  • Improved model accuracy from 38% to 82%.
  • Increased F1 score from 0% to 60%.
  • Improved PR-AUC from 10% to 55%.
  • Reduced pipeline runtime by approximately 40% through PySpark and SQL optimisation.
  • Presented EDA insights and model findings to engineering and operations stakeholders.
Visualisations
+
Target Distribution

Target Distribution

The dataset showed a strong class imbalance, making F1 and PR-AUC more useful than accuracy alone.

Imbalance by Location

Imbalance by Location

Failure likelihood varied by location, showing that risk was not evenly distributed across rail sections.

Random Forest Feature Importance

Random Forest Feature Importance

Tree-based feature importance helped identify meaningful sensor and operational predictors.

Correlation Heatmap

Correlation Heatmap

Correlation analysis was used to identify redundant features and reduce multicollinearity.

Quant Trading Research
Research Project+

This project explored automated cryptocurrency trading systems using Python, Freqtrade, Binance, backtesting, risk management, and cloud deployment. The research focused on understanding market behaviour, testing strategy logic across different market regimes, and improving automated trade supervision. The core strategy logic is not publicly disclosed; this portfolio summary focuses on methodology, risk evaluation, backtesting workflow, and aggregate performance metrics.

PythonFreqtradeBinanceAlgorithmic TradingBacktestingRisk ManagementREST APIVultrSharpe RatioDrawdown Analysis
Problem Statement
+

The goal was to evaluate whether a systematic trading bot could maintain profitability and controlled drawdown across volatile crypto market conditions. The project tested strategy behaviour across approximately two years of historical data, including a period where the overall market declined by around 50%.

Methodology
+
  • Started from public Freqtrade strategies to understand trading bot structure, entry/exit signals, and market behaviour.
  • Backtested strategy variants across 2022–2024 Binance spot market data using multiple crypto pairs.
  • Compared stop-loss configurations, including -14% and -13%, to evaluate the trade-off between return, win rate, and drawdown.
  • Implemented trailing-stop, profit-based exit logic, adaptive exits, and REST API-based trade supervision.
  • Deployed the trading bot on a Vultr cloud server to support continuous 24/7 execution and monitoring.
Results
+
  • Achieved +18.9% ROI with an 83% win rate and less than 10% drawdown during one optimised live trading period.
  • Implemented trailing-stop and adaptive exits to improve trade management.
  • Built analytics for Sharpe ratio, expectancy, return distributions, and volatility metrics.
  • Automated trade supervision and risk controls using Python and Freqtrade REST APIs.
Visualisations
+
Representative Trading Performance

Representative Trading Performance

A simplified summary of one optimised trading period, highlighting ROI, win rate, and drawdown.

Risk and Robustness Metrics

Risk and Robustness Metrics

Backtest metrics showing win rate, drawdown, and Sharpe ratio for a representative configuration.

Configuration Comparison

Configuration Comparison

Comparison of win rate, drawdown, and Sharpe ratio across different stop-loss configurations.

Exit Reason Contribution

Exit Reason Contribution

Breakdown of how different exit mechanisms contributed to strategy performance.