Most of us have heard some variation of the phrase “correlation does not imply causation.” To borrow a popular example, almost every time it rains, you see umbrellas. Do umbrellas cause the rain to fall? Or does the rain cause umbrellas to open? The answer is neither. But that doesn’t mean that there isn’t a strong and direct correlation between rain and umbrellas. There is. But even data science has its limitations. We cannot say with statistical certainty that one phenomenon causes the other.
It is possible—encouraged, even—to use statistics to describe both the nature and strength of the relationship between two variables. Take, for example, the percentage of closed sales that are either foreclosure or short sales (i.e. the Distressed Sales Rate or “DSR”) and home prices (i.e. the Median Sales Price or “MSP”).
On the surface, one might expect a relationship. As foreclosures and short sales make up a larger share of activity, there is more lower priced product selling. One could reasonably expect prices to weaken. Indeed, this dynamic exerts downward pressure on home prices. In other words, a high DSR should correspond to or correlate with a low MSP. That means the relationship is inverse or negative—whereby MSP decreases as DSR increases. That’s the nature of the relationship but what about the strength?
The (X,Y) scatterplot above shows the linear and inverse relationship between DSR and MSP. MSP values are high when DSR values are low and MSP values are low when DSR values are high. The relationship is linear and can be described by the equation Y = -149,595X + 234,195. That equation tells us that the Y-intercept—or the Y value (MSP) when X (DSR) is set to zero—is 234,195. That means when the percentage of sales that are distressed approaches (or hypothetically reaches) zero, home prices should be at about $234,195, based on the 10 years of data used. The slope of the line, represented by the coefficient of X, tells us that every 10% absolute increase in DSR (from 20% to 30%) corresponds to a roughly $15,000 decrease in MSP. Thus, every one percentage increase in DSR corresponds to a roughly $1,500 decrease in MSP. Sure enough, as our local DSR went from about 0% to 60%, home prices went from $235,000 to $145,000. In case you missed it, 60 x 1,500 OR 6 x 15,000 = 90,000 and $235,000 – $90,000 = $145,000. The linear model is pretty accurate and seems to describe the relationship between DSR and MSP fairly well. If only there was a way to measure that quantitatively.
That brings us to the R-square value of 0.94325. We’ve already described the direction or nature of the relationship (inverse), but the R-square describes the strength of that relationship. A value of 0.94325 means that approximately 94.3% of the variation in home prices can be accounted for or explained by the variation in the percentage of sales that are distressed. Now we can say that we have a very strong and indirect relationship, and an extremely low P-value tells us that our results are statistically significant.
There you have it. We’ve used statistics to determine that the distressed sales rate is a major factor affecting home prices. The nature of the relationship is indirect or inverse and the magnitude of the relationship is very strong. Some of you are thinking “well, duh, I could’ve told you that more foreclosures correspond to lower prices.” And that’s true. Yet we hear so much about how supply and demand are the primary drivers of home prices. This brief study suggests that the product mix effect—or the share of sales activity that is distressed—may be an even stronger determinant of home prices than the old standby, supply and demand.
For some funny correlations, see this.