Model
Introduction
This report presents a linear regression model designed to estimate California house values using four predictors:
average_household_income
average_house_age
total_bedrooms
ocean_proximity
We use a cleaned dataset that excludes outliers above $2 million in value and ensures all predictors are positive. This model, while simple, performs surprisingly well when compared to professional valuation tools.
Final Regression Equation
The fitted linear model is:
\[ \text{HouseValue} = -487{,}732.48 + 7.39 \cdot \text{Income} + 12{,}295.14 \cdot \text{Age} + 145.09 \cdot \text{Bedrooms} - 67{,}571.69 \cdot \text{Inland} + 120{,}789.33 \cdot \text{NearBay} + 24{,}533.52 \cdot \text{NearOcean} \]
Where: - Income
= average household income in dollars
- Age
= house age in years (2025 - year built)
- Bedrooms
= total number of bedrooms
- Ocean proximity indicators are binary (1 if the house belongs to the group, else 0)
- <1H OCEAN
is the baseline category when all indicators are 0
Example Prediction: Milpitas Home
Property Info (from Redfin):
- Location: 1570 Hidden Creek Ln, Milpitas, CA
- Age: 2025 - 1989 = 36 years
- Bedrooms: 4
- Ocean Proximity: NEAR BAY
- Estimated Income: $275,000
Prediction:
\[ \text{HouseValue} = -487{,}732.48 + (7.39 \cdot 275{,}000) + (12{,}295.14 \cdot 36) + (145.09 \cdot 4) + 120{,}789.33 = 2{,}108{,}512.15 \]
📍 Estimated Value: $2.11 million
🏠 Redfin Estimate: $2M – $2.3M
✅ Model is within ~8% of the real-world estimate.
Model Accuracy
- R² ≈ 0.56 → model explains ~56% of variance in house prices
- RMSE ≈ $209K → average prediction error
- Performs surprisingly well given its simplicity and limited features
3D Visualization
Conclusion
Even with just four features, this model predicts house values within ~$150K of professional tools like Redfin. It demonstrates the power of regression models and can be extended with more granular property features like square footage, location coordinates, or renovation status for even better accuracy.
This project can serve as a foundation for building real-estate pricing tools, dashboards, or ML-powered decision support systems.