Homelessness Risk Predictor

A predictive model built using Python and scikit-learn to analyze census data and determine key factors contributing to homelessness risk.

Overview

This project predicts individual risk of homelessness using several years of U.S. Census microdata. By applying machine learning to socio-economic and demographic factors, the goal is to help organizations and policymakers shift from reactive support to proactive prevention. Risk scores and key feature explanations are visualized interactively for decision-makers.

Approach

The model was built using Python, pandas, scikit-learn, and imbalanced-learn. I trained a logistic regression classifier on ACS data that spans from 2013-2022, and tested it on unseen 2023 data. Feature engineering included numeric scaling, categorical encoding, and binary variable mapping. Class imbalance was addressed using SMOTE. A full preprocessing and modeling pipeline was developed, and results were exported to Tableau-ready files.

Challenges

Major challenges included aligning variable formats across different years, defining a robust 'at risk' label, and handling strong class imbalance. I resolved these by creating consistent preprocessing rules and testing various label thresholds. SMOTE oversampling helped prevent the model from ignoring the minority class, and temporal testing confirmed that the model generalizes well to future data.

Results

The final model achieves 88% accuracy and 98% recall for at-risk individuals, with an AUC score of 0.97. Key risk factors included lack of health insurance, age (18–25), and educational attainment (especially grade 11 dropouts). Tableau dashboards display model performance, risk distribution, feature importance, and occupational risk insights in an accessible, narrative format.

Key Takeaways

This project sharpened my skills in model training, pipeline architecture, data visualization, and interpretability. It also emphasized the importance of transparent analytics in high-impact areas like housing policy. Ultimately, it shows how structured data and thoughtful modeling can tell stories that matter — and potentially guide real-world intervention strategies.

Explore the Story

The interactive Tableau story below walks through the homelessness risk model step by step. Use the navigation arrows at the top of the visualization to progress through the key drivers, demographic breakdowns, and model performance insights.