Homelessness Risk Predictor
A predictive model built using Python and scikit-learn to analyze 15 years of census data and determine key factors contributing to homelessness risk.
Overview
This project aims to identify and predict the risk factors contributing to homelessness by analyzing extensive socio-economic data. Using 15 years of census data from the American Community Survey (ACS), I built a predictive model to uncover patterns and indicators that could help policymakers, researchers, and support organizations proactively address homelessness.
Approach
Using Python and the scikit-learn library, I developed a machine learning model trained on 15 years of ACS census data. The dataset included various socio-economic factors such as income levels, employment status, housing costs, and educational attainment. I performed data preprocessing, including handling missing values, feature scaling, and normalization. The model was trained using logistic regression and random forest classifiers to determine which features contributed most significantly to the risk of homelessness. Visualizations were created in Tableau to present the findings in an interactive and digestible format.
Challenges
One of the primary challenges was working with such a large and diverse dataset. Integrating data across 15 years required careful handling of changing variable definitions and missing data. Additionally, ensuring that the model could generalize across different regions and demographic groups required balancing the dataset and addressing class imbalances. I tackled this by applying techniques such as oversampling and using stratified cross-validation during model training.
Results
The predictive model successfully identified key risk factors associated with homelessness, including unemployment rates, household income, housing cost burdens, and educational attainment. The interactive Tableau dashboard allows users to explore these factors dynamically, highlighting regions and demographics at higher risk. This tool can help government agencies and non-profits better target their interventions and allocate resources more effectively.
Key Takeaways
This project strengthened my skills in data science, particularly in handling large datasets and training predictive models using scikit-learn. I gained valuable experience in feature engineering, model validation, and visual storytelling with Tableau. Moving forward, I plan to expand the model by incorporating additional datasets and exploring the use of neural networks to improve predictive accuracy.