Eamon Yuan | Data Scientist & Quant Trader

Rail Break Prediction Project

University Industry Placement+

Predicting rail breaks is a rare-event machine learning problem. This project focused on handling class imbalance, selecting meaningful sensor features, building Databricks-based ML pipelines, and evaluating performance with metrics suited to imbalanced data such as F1 and PR-AUC.

DatabricksPySparkSQLMachine LearningXGBoostRandom ForestFeature EngineeringFeature SelectionImbalanced LearningEDAPR-AUCF1 Score

Problem Statement

The goal was to predict whether a rail section may break within the next 30 days using operational and sensor data. The dataset was highly imbalanced, meaning standard classifiers could easily bias toward the majority non-break class.

Methodology

•Performed EDA on class distribution, location-level imbalance, and temporal break trends.
•Researched feature selection methods including correlation filtering, variance thresholding, ANOVA F-test, Lasso logistic regression, and Random Forest feature importance.
•Built feature engineering and model training workflows using Databricks notebooks, PySpark, SQL, and project pipelines.
•Evaluated models using accuracy, F1 score, PR-AUC, and practical model interpretation for stakeholder reporting.

Results

•Improved model accuracy from 38% to 82%.
•Increased F1 score from 0% to 60%.
•Improved PR-AUC from 10% to 55%.
•Reduced pipeline runtime by approximately 40% through PySpark and SQL optimisation.
•Presented EDA insights and model findings to engineering and operations stakeholders.

Visualisations

Target Distribution

The dataset showed a strong class imbalance, making F1 and PR-AUC more useful than accuracy alone.

Imbalance by Location

Failure likelihood varied by location, showing that risk was not evenly distributed across rail sections.

Random Forest Feature Importance

Tree-based feature importance helped identify meaningful sensor and operational predictors.

Correlation Heatmap

Correlation analysis was used to identify redundant features and reduce multicollinearity.

Quant Trading Research

Research Project+

This project explored automated cryptocurrency trading systems using Python, Freqtrade, Binance, backtesting, risk management, and cloud deployment. The research focused on understanding market behaviour, testing strategy logic across different market regimes, and improving automated trade supervision. The core strategy logic is not publicly disclosed; this portfolio summary focuses on methodology, risk evaluation, backtesting workflow, and aggregate performance metrics.

PythonFreqtradeBinanceAlgorithmic TradingBacktestingRisk ManagementREST APIVultrSharpe RatioDrawdown Analysis

Problem Statement

The goal was to evaluate whether a systematic trading bot could maintain profitability and controlled drawdown across volatile crypto market conditions. The project tested strategy behaviour across approximately two years of historical data, including a period where the overall market declined by around 50%.

Methodology

•Started from public Freqtrade strategies to understand trading bot structure, entry/exit signals, and market behaviour.
•Backtested strategy variants across 2022–2024 Binance spot market data using multiple crypto pairs.
•Compared stop-loss configurations, including -14% and -13%, to evaluate the trade-off between return, win rate, and drawdown.
•Implemented trailing-stop, profit-based exit logic, adaptive exits, and REST API-based trade supervision.
•Deployed the trading bot on a Vultr cloud server to support continuous 24/7 execution and monitoring.

Results

•Achieved +18.9% ROI with an 83% win rate and less than 10% drawdown during one optimised live trading period.
•Implemented trailing-stop and adaptive exits to improve trade management.
•Built analytics for Sharpe ratio, expectancy, return distributions, and volatility metrics.
•Automated trade supervision and risk controls using Python and Freqtrade REST APIs.

Visualisations