Model


Call:
lm(formula = median_house_value ~ average_household_income + 
    average_house_age + ocean_proximity, data = housing_clean)

Residuals:
    Min      1Q  Median      3Q     Max 
-721614 -139275  -34274   92332 1458432 

Coefficients:
                            Estimate Std. Error t value Pr(>|t|)    
(Intercept)               -3.344e+05  9.244e+03 -36.180  < 2e-16 ***
average_household_income   6.998e+00  5.602e-02 124.919  < 2e-16 ***
average_house_age          1.066e+04  2.141e+02  49.818  < 2e-16 ***
ocean_proximityINLAND     -6.458e+04  3.782e+03 -17.076  < 2e-16 ***
ocean_proximityNEAR BAY    1.284e+05  6.397e+03  20.080  < 2e-16 ***
ocean_proximityNEAR OCEAN  3.286e+04  4.977e+03   6.603 4.13e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 221800 on 21429 degrees of freedom
Multiple R-squared:  0.5083,    Adjusted R-squared:  0.5082 
F-statistic:  4431 on 5 and 21429 DF,  p-value: < 2.2e-16

Final Regression Equation

The fitted linear model is:

\[ \text{HouseValue} = -334{,}429.74 + 7.00 \cdot \text{Income} + 10{,}664.29 \cdot \text{Age} - 64{,}580.69 \cdot \text{Inland} + 128{,}444.13 \cdot \text{NearBay} + 32{,}864.60 \cdot \text{NearOcean} \]

Where: - Income = average household income (in dollars)
- Age = average house age (in years)
- Inland, NearBay, and NearOcean are binary variables:
- 1 if the house is in that region, else 0
- The baseline is <1H OCEAN


Example Prediction

Suppose a house has: - Income = $90,000
- Age = 30 years
- Ocean proximity = “NEAR BAY” (i.e., NearBay = 1)

Then:

\[ \text{HouseValue} = -334{,}429.74 + (7.00 \cdot 90{,}000) + (10{,}664.29 \cdot 30) + 128{,}444.13 = 1{,}191{,}701.59 \]


Interpretation

  • Every $1 increase in income → +$7 in house value
  • Every additional year of age → +$10,664.29 in value
  • Compared to <1H OCEAN:
    • INLAND homes are worth ~$64.6K less
    • NEAR BAY homes are worth ~$128.4K more
    • NEAR OCEAN homes are worth ~$32.8K more