Results
This section discusses the results of the analysis performed in the previous section. As discussed earlier, a dataset with information on single-family homes in Denver, CO was used to fit three models for predicting home values: a linear regression, the lasso and a random forest.
The random forest model was the best performer. While the R-squared of the linear regression and the lasso were .87 and .86, respectively, the R-squared of the random forest regression was 0.91.
Further, it is noteworthy that the home value estimate based on zestimates of comparable homes from the training dataset (“zestCompVal”) was the most important feature in explaining home values, as can be seen from the feature importances table below. The second most important feature was the home price at the time of the last sale adjusted for housing price appreciation between the date of the last sale and present time (“lastSaleAmount”). These results suggest that recent prices of comparable homes and the home’s own price history are the most important factors in forecasting the value of a home.
index featureImportances
0 0.681634 zestCompVal
1 0.230274 lastSaleAmount
2 0.041095 squareFootage
3 0.016680 lotSize
4 0.011324 lastSaleAmountAfter2012
5 0.009555 bathrooms
6 0.003367 priorSaleAmountAfter2012
7 0.003084 rebuiltDummy
8 0.001615 80206
9 0.001373 80209
Previous step: Analysis.