Author Topic: Forecasting 10-year stock market returns (Read 27817 times)

rachael talcott · « **on:** May 01, 2016, 12:10:53 PM »

Yes, I know that correlation does not always indicate causation, and the future may differ significantly from the past. But as a scientist, I also know that sometimes correlation does indicate causation and that there are regularities in nature, even if the regularities are messy rather than perfect. Shiller's PE10 makes me nervous about getting into the stock market for this reason. But I've been reading a bit more, in particular this: https://personal.vanguard.com/pdf/s338.pdf

In figure 2, the authors show that no one model predicts future performance very well. Shiller's PE10 is the best with and R^2 of 0.43 (ie 43% of the observed variance in stock performance can be explained by that variable). They also list some other variables, like debt/GDP and dividend yield, which have weaker correlations. I would think that the obvious thing to do would be to combine the variables into a simple model to see if the correlation became stronger if you accounted for multiple variables at once.

So I dug out the data and tried it myself. The best model I have come up with (for the years that the paper uses) has an R2 of about 0.7. Surely I can't be the first person to think of this. I realize that some variables will be spurious, but in a model with a lot of variables (mine has four) wouldn't the effect of the spurious variables be diluted out?

Also, the authors found that there was a positive correlation (R2 = 0.2) between government debt/GDP and future stock returns. This was the opposite of what was expected, and they just dismiss it -- "we would not expect such a correlation to persist." But when I look at the same correlation for the years 1900-1928 (the authors only went back to 1929) the correlation holds with R2=0.58. It seems plausible to me that borrowing money might stimulate the economy in the near future. Why not use this as a predictive factor?

They also include 10-year rainfall as a "reality check" and predict no correlation. When they find a small correlation, they interpret it as spurious. But wouldn't an extended drought be expected to have some effect on the economy?

I am attaching two graphs -- one shows a scatter plot of 10-year forward returns (inflation adjusted, 1929 to 2006 as in the Vanguard paper) vs. the four-factor model (PE10, rainfall, debt/gdp, and this http://www.philosophicaleconomics.com/2013/12/the-single-greatest-predictor-of-future-stock-market-returns/ which was not in the Vanguard paper). The other shows a timeline with the model and actual 10-year forward returns.

Thoughts? Is this sort of thing published anywhere?

Grog · « **Reply #1 on:** May 02, 2016, 03:12:04 AM »

I won't comment your findings, but it seems to me you could be interested in the following website:
http://www.philosophicaleconomics.com

Specifically about Schiller CAPE:

http://www.philosophicaleconomics.com/2015/03/payout/

Sent from my YD201 using Tapatalk

Indexer · « **Reply #2 on:** May 02, 2016, 04:48:32 AM »

The good news, these models, especially the one linked with the average investor stock allocation model, seem to have high correlations to actual returns.

The bad news, they still say stocks will return more than bonds and cash over the next 10 years. So you can't really act on the data.

You can adjust your expected returns for the next ten years. On that note most financial institutions have been saying you should probably look at returns in the 6-8% range as opposed to the 8-10% range for the foreseeable future.

I like having one more tool in my toolbelt, so thank you for the links, but this tool isn't saying anything different than other valuation metrics right now. Markets are a bit overvalued so we should expect lower than average returns for the next 10 years.

rachael talcott · « **Reply #3 on:** May 02, 2016, 05:52:19 AM »

Grog: Thanks. I've seen that blog, and it is very interesting.

Indexer: I actually can act on it. I am not in stocks at all right now and have a little money that I could choose to put in. Do you know *why* financial institutions are predicting lower returns? I keep hearing about Shiller valuation, but pretty much every other indicator I can find gives a rosier forecast than PE10.

Seppia · « **Reply #4 on:** May 02, 2016, 06:05:15 AM »

To my knowledge the P/B metric gives more or less the same expected return than Shiller PE.
What metrics are you referring to?

maizefolk · « **Reply #5 on:** May 02, 2016, 06:21:02 AM »

The argument against Shiller PE is that accounting rules have changed so that what counts as "profit" is necessarily a smaller slice of revenue than it was in the past. I don't know enough about corporate-scale accounting to evaluate the truth or relevance of that statement.

As the number of explanatory values in your model goes up, you need more total datapoints to prove that you have a real link and haven't just overfit your model to the data you have. Between 1929 and 2006 there are less than eight non-overlapping ten year periods. With four explanatory variables and seven datapoints, I know it would be possible to generate a model with a high R^2 value but no true statistical significance. However, what they've done is use a much larger number of sliding windows (1971-1980 shares 90% of its history with 1972-1982 and likely had similar starting PE10s and debt to GDP ratios) so I'm not sure how many sliding window datapoints one would need to validate a four-factor model.

This is the really frustrating thing about (macro) economics. The datasets just aren't big enough.

Aphalite · « **Reply #6 on:** May 02, 2016, 06:28:11 AM »

From Seppia's thread:

Quote from: Aphalite on April 30, 2016, 02:26:54 PM

Investor returns is growth rate + dividend as a percentage of capital + percentage change in valuation (what Bogle calls speculative element)

The growth rate might be lower in the future (maybe 2-4%, tho buybacks will boost it more than what some observers are expecting), and dividends are around 2% if you are looking at a total market index, but the reason most people are screaming about a low return environment is high valuation (i.e., your example of Schiller pe). But Schiller pe is overstating how expensive the stock market is because interest rates have never been this low. The opportunity cost of a 2% treasury bond is the lowest it's ever been, pushing intrinsic security prices way up from what you can see in a portfolio visualizer (and security earning power grows over the ten year treasury holding period)

Your examples of history mostly points to a decreasing interest rate environment for the last 40 some odd years, and most people's assumption is that rate will start increasing again, which should drive down valuation due to opportunity cost. Well, now you're speculating on interest rate movement, and people have been expecting inflation to take off for five years now. I think stocks are probably on the higher side of its fair value range but I don't think it's overvalued to the point that real eternal in the future will be 2% less than historical amount of 6%. Just my 2 cents

FrugalFan · « **Reply #7 on:** May 02, 2016, 06:30:12 AM »

This is pretty interesting, don't know why I didn't think to try something like this.

rachael talcott · « **Reply #8 on:** May 02, 2016, 07:03:23 AM »

Quote from: Seppia on May 02, 2016, 06:05:15 AM

To my knowledge the P/B metric gives more or less the same expected return than Shiller PE.
What metrics are you referring to?

http://www.philosophicaleconomics.com/2013/12/the-single-greatest-predictor-of-future-stock-market-returns/

Also debt/GDP

Dividend yield does predict lower than average, so there's one that agrees with PE10.

rachael talcott · « **Reply #9 on:** May 02, 2016, 07:13:08 AM »

Quote from: Aphalite on May 02, 2016, 06:28:11 AM

From Seppia's thread:

Quote from: Aphalite on April 30, 2016, 02:26:54 PM
Investor returns is growth rate + dividend as a percentage of capital + percentage change in valuation (what Bogle calls speculative element)

The growth rate might be lower in the future (maybe 2-4%, tho buybacks will boost it more than what some observers are expecting), and dividends are around 2% if you are looking at a total market index, but the reason most people are screaming about a low return environment is high valuation (i.e., your example of Schiller pe). But Schiller pe is overstating how expensive the stock market is because interest rates have never been this low. The opportunity cost of a 2% treasury bond is the lowest it's ever been, pushing intrinsic security prices way up from what you can see in a portfolio visualizer (and security earning power grows over the ten year treasury holding period)

Your examples of history mostly points to a decreasing interest rate environment for the last 40 some odd years, and most people's assumption is that rate will start increasing again, which should drive down valuation due to opportunity cost. Well, now you're speculating on interest rate movement, and people have been expecting inflation to take off for five years now. I think stocks are probably on the higher side of its fair value range but I don't think it's overvalued to the point that real eternal in the future will be 2% less than historical amount of 6%. Just my 2 cents

Interesting. Thanks.

Aphalite · « **Reply #10 on:** May 02, 2016, 07:58:49 AM »

Quote from: rachael talcott on May 02, 2016, 07:03:23 AM

http://www.philosophicaleconomics.com/2013/12/the-single-greatest-predictor-of-future-stock-market-returns/

Also debt/GDP

Dividend yield does predict lower than average, so there's one that agrees with PE10.

Seems like this article hits on the point I was trying to make, from the conclusion paragraph:

"A Note on “Overvaluation”

There’s a raging debate right now between bulls and bears over whether the U.S. stock market is presently overvalued. The debate rages on because the term is poorly defined. What, precisely, does it mean to say that something is “overvalued”?

When we say that the stock market is “overvalued”, we might mean that it’s currently valued more expensively than it typically has been in the past. Over its history, the U.S. stock market has offered, on average, some expected total return–say 8% to 10%. But now it’s priced for 5% or 6% (using our metric). So it’s “overvalued.”

Fair enough, bulls shouldn’t disagree. There are tons of reasons why the present stock market is unlikely to produce the 8% to 10% returns that it has produced, on average, throughout history. On almost every relevant measure, it’s starting out from a higher-than-average level.

The more important question, however, is this: why should the stock market offer investors the average historical return right now? If, over the next 10 years, bonds are offering investors 2.8%, and cash is offering them less than 1%, why should stocks be priced to offer them 8% to 10%?"

10 year treasury bonds are now at 1.8%

rachael talcott · « **Reply #11 on:** May 02, 2016, 08:05:12 AM »

Quote from: maizeman on May 02, 2016, 06:21:02 AM

As the number of explanatory values in your model goes up, you need more total datapoints to prove that you have a real link and haven't just overfit your model to the data you have. Between 1929 and 2006 there are less than eight non-overlapping ten year periods. With four explanatory variables and seven datapoints, I know it would be possible to generate a model with a high R^2 value but no true statistical significance. However, what they've done is use a much larger number of sliding windows (1971-1980 shares 90% of its history with 1972-1982 and likely had similar starting PE10s and debt to GDP ratios) so I'm not sure how many sliding window datapoints one would need to validate a four-factor model.

This is the really frustrating thing about (macro) economics. The datasets just aren't big enough.

This seems like a legitimate concern and worth more thought. I wonder if the correlation goes away with 5-year windows...

beltim · « **Reply #12 on:** May 02, 2016, 08:08:08 AM »

Quote from: Aphalite on May 02, 2016, 06:28:11 AM

From Seppia's thread:

Quote from: Aphalite on April 30, 2016, 02:26:54 PM
Investor returns is growth rate + dividend as a percentage of capital + percentage change in valuation (what Bogle calls speculative element)

This formula is demonstrably untrue since it doesn't take into account current earnings, which is often the most important determinant of investor returns. Consider a company with no real growth rate (i.e. its earnings increase exactly with inflation), that pays no dividend. What should its return to investors be? Your formula says 0 outside of valuation changes, which is obviously wrong.

Aphalite · « **Reply #13 on:** May 02, 2016, 08:11:24 AM »

Quote from: beltim on May 02, 2016, 08:08:08 AM

This formula is demonstrably untrue since it doesn't take into account current earnings, which is often the most important determinant of investor returns. Consider a company with no real growth rate (i.e. its earnings increase exactly with inflation), that pays no dividend. What should its return to investors be? Your formula says 0 outside of valuation changes, which is obviously wrong.

Beltim, I agree with you, I cut off the first part of my response to Seppia because it wasn't relevant to this thread, here's the whole exchange (relevant parts to your objection are bolded):

Quote from: Aphalite on April 30, 2016, 02:26:54 PM

Quote from: Seppia on April 28, 2016, 07:04:50 AM
There is also some basic math involved at the extremes

To exaggerate/simplify: if you have a Shiller PE of 50, you have to assume returns around 2% unless you have reasonable expectation that earnings are going to increase exponentially (and stay at the higher level) in the future.

This isn't right for investor returns

It is the formula for if you owned the entire asset and earnings did not grow any further.

Investor returns is growth rate + dividend as a percentage of capital + percentage change in valuation (what Bogle calls speculative element)

The growth rate might be lower in the future (maybe 2-4%, tho buybacks will boost it more than what some observers are expecting), and dividends are around 2% if you are looking at a total market index, but the reason most people are screaming about a low return environment is high valuation (i.e., your example of Schiller pe). But Schiller pe is overstating how expensive the stock market is because interest rates have never been this low. The opportunity cost of a 2% treasury bond is the lowest it's ever been, pushing intrinsic security prices way up from what you can see in a portfolio visualizer (and security earning power grows over the ten year treasury holding period)

Your examples of history mostly points to a decreasing interest rate environment for the last 40 some odd years, and most people's assumption is that rate will start increasing again, which should drive down valuation due to opportunity cost. Well, now you're speculating on interest rate movement, and people have been expecting inflation to take off for five years now. I think stocks are probably on the higher side of its fair value range but I don't think it's overvalued to the point that real eternal in the future will be 2% less than historical amount of 6%. Just my 2 cents

Aphalite · « **Reply #14 on:** May 02, 2016, 08:13:34 AM »

Quote from: beltim on May 02, 2016, 08:08:08 AM

This formula is demonstrably untrue since it doesn't take into account current earnings, which is often the most important determinant of investor returns. Consider a company with no real growth rate (i.e. its earnings increase exactly with inflation), that pays no dividend. What should its return to investors be? Your formula says 0 outside of valuation changes, which is obviously wrong.

Additionally, I would argue that if there were no dividends, the Company would be holding its retained earnings in cash, and thus valuation would compensate for it

If the company had to reinvest all of its earnings back into the operations (i.e. maintenance capex) and earnings still did not grow, then you would definitely have a 0 return investment. All of its cash flow is spoken for, it's a terrible investment

AdrianC · « **Reply #15 on:** May 02, 2016, 08:15:56 AM »

Quote from: beltim on May 02, 2016, 08:08:08 AM

Quote from: Aphalite on May 02, 2016, 06:28:11 AM
From Seppia's thread:

Quote from: Aphalite on April 30, 2016, 02:26:54 PM
Investor returns is growth rate + dividend as a percentage of capital + percentage change in valuation (what Bogle calls speculative element)

This formula is demonstrably untrue since it doesn't take into account current earnings, which is often the most important determinant of investor returns. Consider a company with no real growth rate (i.e. its earnings increase exactly with inflation), that pays no dividend. What should its return to investors be? Your formula says 0 outside of valuation changes, which is obviously wrong.

Your company is spending every penny of its earnings and is achieving zero growth.

beltim · « **Reply #16 on:** May 02, 2016, 08:19:08 AM »

Quote from: Aphalite on May 02, 2016, 08:13:34 AM

Quote from: beltim on May 02, 2016, 08:08:08 AM
This formula is demonstrably untrue since it doesn't take into account current earnings, which is often the most important determinant of investor returns. Consider a company with no real growth rate (i.e. its earnings increase exactly with inflation), that pays no dividend. What should its return to investors be? Your formula says 0 outside of valuation changes, which is obviously wrong.

Additionally, I would argue that if there were no dividends, the Company would be holding its retained earnings in cash, and thus valuation would compensate for it

If the company had to reinvest all of its earnings back into the operations (i.e. maintenance capex) and earnings still did not grow, then you would definitely have a 0 return investment. All of its cash flow is spoken for, it's a terrible investment

Depends on what you mean by valuation changes, I suppose. Valuation usually means price/earnings ratio, which should be different as the company retains earnings. But I would argue that this isn't a "speculative element" which is a really poor way to frame it.

beltim · « **Reply #17 on:** May 02, 2016, 08:20:01 AM »

Quote from: AdrianC on May 02, 2016, 08:15:56 AM

Quote from: beltim on May 02, 2016, 08:08:08 AM
Quote from: Aphalite on May 02, 2016, 06:28:11 AM
From Seppia's thread:

Quote from: Aphalite on April 30, 2016, 02:26:54 PM
Investor returns is growth rate + dividend as a percentage of capital + percentage change in valuation (what Bogle calls speculative element)

This formula is demonstrably untrue since it doesn't take into account current earnings, which is often the most important determinant of investor returns. Consider a company with no real growth rate (i.e. its earnings increase exactly with inflation), that pays no dividend. What should its return to investors be? Your formula says 0 outside of valuation changes, which is obviously wrong.

Your company is spending every penny of its earnings and is achieving zero growth.

Nothing in my example says anything about spending. The logical assumption would be that it retains earnings, just like Aphalite understood.

Aphalite · « **Reply #18 on:** May 02, 2016, 09:00:35 AM »

Quote from: beltim on May 02, 2016, 08:19:08 AM

Depends on what you mean by valuation changes, I suppose. Valuation usually means price/earnings ratio, which should be different as the company retains earnings. But I would argue that this isn't a "speculative element" which is a really poor way to frame it.

That's a good point - I'm only framing it the way Bogle did because he's a hero in these parts, that's all - there's all sorts of different reasons valuation could change, like you pointed out

JZinCO · « **Reply #19 on:** May 02, 2016, 09:08:28 AM »

Quote from: rachael talcott on May 01, 2016, 12:10:53 PM

In figure 2, the authors show that no one model predicts future performance very well. Shiller's PE10 is the best with and R^2 of 0.43 (ie 43% of the observed variance in stock performance can be explained by that variable). They also list some other variables, like debt/GDP and dividend yield, which have weaker correlations. I would think that the obvious thing to do would be to combine the variables into a simple model to see if the correlation became stronger if you accounted for multiple variables at once.

So I dug out the data and tried it myself. The best model I have come up with (for the years that the paper uses) has an R2 of about 0.7. Surely I can't be the first person to think of this. I realize that some variables will be spurious, but in a model with a lot of variables (mine has four) wouldn't the effect of the spurious variables be diluted out?

Also, the authors found that there was a positive correlation (R2 = 0.2) between government debt/GDP and future stock returns. This was the opposite of what was expected, and they just dismiss it -- "we would not expect such a correlation to persist." But when I look at the same correlation for the years 1900-1928 (the authors only went back to 1929) the correlation holds with R2=0.58. It seems plausible to me that borrowing money might stimulate the economy in the near future. Why not use this as a predictive factor?

They also include 10-year rainfall as a "reality check" and predict no correlation. When they find a small correlation, they interpret it as spurious. But wouldn't an extended drought be expected to have some effect on the economy?

I know you are a scientist so this should be in your wheelhouse.
(1) Recall that the coeff. of determination (r2) is not a model selection tool. Each time you add an independent factor, your r2 will do up, no matter how uncorrelated it is with your dependent variable. By using r2 in this way, you easily run the risk of overfitting.
(2) If you are adding alot of factors, you should first run a correlation plot to see how certain independent factors co-vary. If you do have cross-correlation then your ordinary least square regression that you did does not produce the 'best, linear unbiased estimator'. You need to either address the cross-correlation with a generalized linear model adding the covariance matrix of your ind. factors.
(3) If you are making OLS models you should at least run an ANOVA ad-hoc to look at the significance of each factor
(4) A more robust investigation of each ind. factor is to use the Lindeman, Merenda, and Gold bootstrapped sequential sums-of-squares.
(5) Consider using a fit statistic that adequately penalizes for adding addtl. factors such as log-liklihood, AIC or BIC.
(6) Consider using an unbiased algorithm of model selection such as All Subsets Regression
(7) Returns are auto-correlated. When you have data taken over time or space you need to consider that observations closer in time or space tend to be more similar than observations taken over larger lag distances. To look at this, simply plot a correlogram of your dataset. For that reason I would scrap your entire approach and consider a time-series analysis.

No offense but this is amateur technical analysis and it leave you without pants once the tide goes out.

JZinCO · « **Reply #20 on:** May 02, 2016, 09:13:18 AM »

Quote from: rachael talcott on May 02, 2016, 08:05:12 AM

Quote from: maizeman on May 02, 2016, 06:21:02 AM

As the number of explanatory values in your model goes up, you need more total datapoints to prove that you have a real link and haven't just overfit your model to the data you have. Between 1929 and 2006 there are less than eight non-overlapping ten year periods. With four explanatory variables and seven datapoints, I know it would be possible to generate a model with a high R^2 value but no true statistical significance. However, what they've done is use a much larger number of sliding windows (1971-1980 shares 90% of its history with 1972-1982 and likely had similar starting PE10s and debt to GDP ratios) so I'm not sure how many sliding window datapoints one would need to validate a four-factor model.

This is the really frustrating thing about (macro) economics. The datasets just aren't big enough.

This seems like a legitimate concern and worth more thought. I wonder if the correlation goes away with 5-year windows...

'datasets aren't big enough' is a bad bad excuse. There are many ways to handle small datasets which include monte carlo, K folds cross validation, jack knifing and boot strapping. Not only can they be used to virtually create larger datasets while maintaining the dsitributional properties of your data and its same covariance matrix AND you can test your model. For example, K folds cross val essentially uses multiple sliding and overlapping subsets of the dataset.

maizefolk · « **Reply #21 on:** May 02, 2016, 10:35:47 AM »

Quote from: JZinCO on May 02, 2016, 09:13:18 AM

Quote from: rachael talcott on May 02, 2016, 08:05:12 AM
Quote from: maizeman on May 02, 2016, 06:21:02 AM

As the number of explanatory values in your model goes up, you need more total datapoints to prove that you have a real link and haven't just overfit your model to the data you have. Between 1929 and 2006 there are less than eight non-overlapping ten year periods. With four explanatory variables and seven datapoints, I know it would be possible to generate a model with a high R^2 value but no true statistical significance. However, what they've done is use a much larger number of sliding windows (1971-1980 shares 90% of its history with 1972-1982 and likely had similar starting PE10s and debt to GDP ratios) so I'm not sure how many sliding window datapoints one would need to validate a four-factor model.

This is the really frustrating thing about (macro) economics. The datasets just aren't big enough.

This seems like a legitimate concern and worth more thought. I wonder if the correlation goes away with 5-year windows...

'datasets aren't big enough' is a bad bad excuse. There are many ways to handle small datasets which include monte carlo, K folds cross validation, jack knifing and boot strapping. Not only can they be used to virtually create larger datasets while maintaining the dsitributional properties of your data and its same covariance matrix AND you can test your model. For example, K folds cross val essentially uses multiple sliding and overlapping subsets of the dataset.

K-fold cross validation allows you to test your model. Bootstrapping or jacknife resampling lets you assign uncertainly values to your model. Employing those techniques would be a great way for the OP to test the statistical significance of their result, but it doesn't allow them to squeeze more information out of seven independent datapoints, just to define with more certainty how much of what they are finding is signal and how much is noise.

Now Monte Carlo simulations CAN give you larger datasets than you started with, but the question is how representative are these new datasets of your original data? There are surely people a lot more talented than I working in this area, but every attempt I've had to simulate stock market returns based on monte carlo approaches has underestimated long-term (10+ year) volatility relative to actual historical time series results, likely because, as you allude to in your previous post, stock market returns are auto-correlated in ways that aren't captured in my, naive, monto carlo analyses.

rachael talcott · « **Reply #22 on:** May 02, 2016, 11:27:00 AM »

Quote from: Aphalite on May 02, 2016, 07:58:49 AM

Quote from: rachael talcott on May 02, 2016, 07:03:23 AM
http://www.philosophicaleconomics.com/2013/12/the-single-greatest-predictor-of-future-stock-market-returns/

Also debt/GDP

Dividend yield does predict lower than average, so there's one that agrees with PE10.

Seems like this article hits on the point I was trying to make, from the conclusion paragraph:

"A Note on “Overvaluation”

There’s a raging debate right now between bulls and bears over whether the U.S. stock market is presently overvalued. The debate rages on because the term is poorly defined. What, precisely, does it mean to say that something is “overvalued”?

When we say that the stock market is “overvalued”, we might mean that it’s currently valued more expensively than it typically has been in the past. Over its history, the U.S. stock market has offered, on average, some expected total return–say 8% to 10%. But now it’s priced for 5% or 6% (using our metric). So it’s “overvalued.”

Fair enough, bulls shouldn’t disagree. There are tons of reasons why the present stock market is unlikely to produce the 8% to 10% returns that it has produced, on average, throughout history. On almost every relevant measure, it’s starting out from a higher-than-average level.

The more important question, however, is this: why should the stock market offer investors the average historical return right now? If, over the next 10 years, bonds are offering investors 2.8%, and cash is offering them less than 1%, why should stocks be priced to offer them 8% to 10%?"

10 year treasury bonds are now at 1.8%

I was reading the graph wrong -- the "returns" axis is actually inverted -- so thanks for the correction.

rachael talcott · « **Reply #23 on:** May 02, 2016, 05:21:41 PM »

Quote from: maizeman on May 02, 2016, 06:21:02 AM

The argument against Shiller PE is that accounting rules have changed so that what counts as "profit" is necessarily a smaller slice of revenue than it was in the past. I don't know enough about corporate-scale accounting to evaluate the truth or relevance of that statement.

As the number of explanatory values in your model goes up, you need more total datapoints to prove that you have a real link and haven't just overfit your model to the data you have. Between 1929 and 2006 there are less than eight non-overlapping ten year periods. With four explanatory variables and seven datapoints, I know it would be possible to generate a model with a high R^2 value but no true statistical significance. However, what they've done is use a much larger number of sliding windows (1971-1980 shares 90% of its history with 1972-1982 and likely had similar starting PE10s and debt to GDP ratios) so I'm not sure how many sliding window datapoints one would need to validate a four-factor model.

This is the really frustrating thing about (macro) economics. The datasets just aren't big enough.

http://vassarstats.net/rsig.html

According to this calculator, if you have seven data points, you need R^2 of ~0.57 to reach significance at the 0.05 level and R^2 of ~0.77 to reach significance at the 0.1 level. That's not as high of a bar as I would have thought. It's possible, though that there is something else wrong -- I haven't checked for meeting the assumptions for the pearson correlation.

Thoughts?

maizefolk · « **Reply #24 on:** May 02, 2016, 07:12:02 PM »

It's a lower bar than I would have guessed as well but your source seems credible. However, I believe that's testing a linear relationship between only two variables. The problem is that more explanatory variables you incorporate into your model, more more datapoints you need to avoid over-fitting. If I understood correctly, in your original post you ran a multiple linear regression using four explanatory variables. This is a bit outside my area of expertise, but I did find this:

Quote

Just like the example with multiple means, you must have a sufficient number of observations for each term in a regression model. Simulation studies show that a good rule of thumb is to have 10-15 observations per term in multiple linear regression.

That source also suggests that you could test how accurate your model is but remove data from one decade at a time, re-running the linear regression, and seeing how well you could predict stock market returns from the missing decade using the model built from the remaining decades.
Source: http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models

JZinCO · « **Reply #25 on:** May 02, 2016, 07:56:58 PM »

Quote from: maizeman on May 02, 2016, 10:35:47 AM

K-fold cross validation allows you to test your model. Bootstrapping or jacknife resampling lets you assign uncertainly values to your model. Employing those techniques would be a great way for the OP to test the statistical significance of their result, but it doesn't allow them to squeeze more information out of seven independent datapoints, just to define with more certainty how much of what they are finding is signal and how much is noise.

Now Monte Carlo simulations CAN give you larger datasets than you started with, but the question is how representative are these new datasets of your original data? There are surely people a lot more talented than I working in this area, but every attempt I've had to simulate stock market returns based on monte carlo approaches has underestimated long-term (10+ year) volatility relative to actual historical time series results, likely because, as you allude to in your previous post, stock market returns are auto-correlated in ways that aren't captured in my, naive, monto carlo analyses.

Good points. What I mean to say with K-folds is that if you have, say 100 years of data, you can do some slicing and dicing so that you can have many more reps of periods <100 yr in length. And if the data are sufficiently explained by the independent factors it shouldn't matter that you subset the data and K-folds should be a boon to one's efforts.

I would say that bootstrapping is amazing for (a) when you want to make inferences from the empirical distribution function and not rely on a normal distribution function. Also (b) when sample sizes are insufficient for inference making. So we cannot make more observations but we can better handle them in regressions using jack/boot.

On that note OP should post some diagnostics (residual plot, Q-Q plot). My WAG says assumptions of OP's OLS assumption of normality is violated and should be addressed. I am also just really really am confident of the work out that shows returns are auto-correlated violating an assumption that observations are identically and independently distributed so if OP could post a correlogram or variogram of the data, that would be useful.

Indexer · « **Reply #26 on:** May 02, 2016, 08:47:27 PM »

Quote from: rachael talcott on May 02, 2016, 05:52:19 AM

Indexer: I actually can act on it. I am not in stocks at all right now and have a little money that I could choose to put in. Do you know *why* financial institutions are predicting lower returns? I keep hearing about Shiller valuation, but pretty much every other indicator I can find gives a rosier forecast than PE10.

How can you act on it?

It gives basically the same prediction as Shiller CAPE and mrkt cap/GDP which is that markets are overvalued, but no where near bubble/nosebleed territory. All three of these metrics are useful over 'long' periods of time, like 10 years. They are all completely useless during a short time frame. So how can you act on it?

You are sitting in cash which earns 1% or less while this metric is saying stocks should return 5-6% over the next 10 years. How is that a good idea? Your actions are directly contradicting the very information you are presenting to us.

Shiller by the way is saying basically the same thing... If PE10 stays above the mean like it has the past 2 decades it is showing returns in the 2.5-5% range. Which is still higher than cash.

Moral of the story: The same moral to every investing story... have a goal and an asset allocation based on your goals, and stick to it. Market timing is normally worse than just getting in and staying in.

rachael talcott · « **Reply #27 on:** May 02, 2016, 10:10:44 PM »

Quote from: maizeman on May 02, 2016, 07:12:02 PM

It's a lower bar than I would have guessed as well but your source seems credible. However, I believe that's testing a linear relationship between only two variables. The problem is that more explanatory variables you incorporate into your model, more more datapoints you need to avoid over-fitting. If I understood correctly, in your original post you ran a multiple linear regression using four explanatory variables. This is a bit outside my area of expertise, but I did find this:

Quote
Just like the example with multiple means, you must have a sufficient number of observations for each term in a regression model. Simulation studies show that a good rule of thumb is to have 10-15 observations per term in multiple linear regression.

That source also suggests that you could test how accurate your model is but remove data from one decade at a time, re-running the linear regression, and seeing how well you could predict stock market returns from the missing decade using the model built from the remaining decades.
Source: http://blog.minitab.com/blog/adventures-in-statistics/the-danger-of-overfitting-regression-models

That seems reasonable. I pulled the most recent decade (ending in 2006) out, ran the regression without that data, and used the model to predict that missing decade. Here are the differences between predicted and actual (i.e. -.01 means that the model predicted 1% lower than actual). It's not bad. But I should probably do more than just that one.

-0.01063741
-0.014248588
-0.004224565
0.006233349
0.017554216
0.005933412
0.006278946
0.044305481
0.013233982
-0.010188546

rachael talcott · « **Reply #28 on:** May 02, 2016, 10:17:37 PM »

Quote from: JZinCO on May 02, 2016, 09:08:28 AM

Quote from: rachael talcott on May 01, 2016, 12:10:53 PM
In figure 2, the authors show that no one model predicts future performance very well. Shiller's PE10 is the best with and R^2 of 0.43 (ie 43% of the observed variance in stock performance can be explained by that variable). They also list some other variables, like debt/GDP and dividend yield, which have weaker correlations. I would think that the obvious thing to do would be to combine the variables into a simple model to see if the correlation became stronger if you accounted for multiple variables at once.

So I dug out the data and tried it myself. The best model I have come up with (for the years that the paper uses) has an R2 of about 0.7. Surely I can't be the first person to think of this. I realize that some variables will be spurious, but in a model with a lot of variables (mine has four) wouldn't the effect of the spurious variables be diluted out?

Also, the authors found that there was a positive correlation (R2 = 0.2) between government debt/GDP and future stock returns. This was the opposite of what was expected, and they just dismiss it -- "we would not expect such a correlation to persist." But when I look at the same correlation for the years 1900-1928 (the authors only went back to 1929) the correlation holds with R2=0.58. It seems plausible to me that borrowing money might stimulate the economy in the near future. Why not use this as a predictive factor?

They also include 10-year rainfall as a "reality check" and predict no correlation. When they find a small correlation, they interpret it as spurious. But wouldn't an extended drought be expected to have some effect on the economy?

I know you are a scientist so this should be in your wheelhouse.
(1) Recall that the coeff. of determination (r2) is not a model selection tool. Each time you add an independent factor, your r2 will do up, no matter how uncorrelated it is with your dependent variable. By using r2 in this way, you easily run the risk of overfitting.
(2) If you are adding alot of factors, you should first run a correlation plot to see how certain independent factors co-vary. If you do have cross-correlation then your ordinary least square regression that you did does not produce the 'best, linear unbiased estimator'. You need to either address the cross-correlation with a generalized linear model adding the covariance matrix of your ind. factors.
(3) If you are making OLS models you should at least run an ANOVA ad-hoc to look at the significance of each factor
(4) A more robust investigation of each ind. factor is to use the Lindeman, Merenda, and Gold bootstrapped sequential sums-of-squares.
(5) Consider using a fit statistic that adequately penalizes for adding addtl. factors such as log-liklihood, AIC or BIC.
(6) Consider using an unbiased algorithm of model selection such as All Subsets Regression
(7) Returns are auto-correlated. When you have data taken over time or space you need to consider that observations closer in time or space tend to be more similar than observations taken over larger lag distances. To look at this, simply plot a correlogram of your dataset. For that reason I would scrap your entire approach and consider a time-series analysis.

No offense but this is amateur technical analysis and it leave you without pants once the tide goes out.

I was actually hoping that someone would point me to a more professional analysis with explanation, but since what I'm getting is advice on the stats, I'm willing to keep reading and learning. But again, if you know of anything published, I'd love to read it. As I said, I can't possibly be the first person to think of combining variables like this.

I did run a correlation of all variables against each other at the beginning and just left out variables if they correlated well with variables that I already had. I did not include PE1 because it correlates with PE10. Two of my variables do correlate weakly, and I need to look at that more carefully.

JZinCO · « **Reply #29 on:** May 03, 2016, 12:03:09 AM »

You're def. not the first one. There are a ton of folks who do like this. I just wanted to mention some potential pitfalls to avoid you having the overconfidence and making mistakes.

I can't point you to any resource however. I've done some fiddling with looking at predictive models of returns and I don't have any confidence in the approach. If that's the way you go though, be cautious and good luck.

Northman · « **Reply #30 on:** May 03, 2016, 12:54:44 AM »

Interesting question. I know there is a German company called StarCapital that did some predictions based on CAPE and PB.

CAPE and PB based long term stock market expectations:
http://www.starcapital.de/research/CAPE_Stock_Market_Expectations

Here is a more detailed PDF:
http://www.starcapital.de/files/publikationen/Research_2016-01_Predicting_Stock_Market_Returns_Shiller_CAPE_Keimling.pdf

maizefolk · « **Reply #31 on:** May 03, 2016, 03:26:10 AM »

Looks like a promising result!

Aphalite · « **Reply #32 on:** May 03, 2016, 06:19:33 AM »

Quote from: JZinCO on May 03, 2016, 12:03:09 AM

You're def. not the first one. There are a ton of folks who do like this. I just wanted to mention some potential pitfalls to avoid you having the overconfidence and making mistakes.

I can't point you to any resource however. I've done some fiddling with looking at predictive models of returns and I don't have any confidence in the approach. If that's the way you go though, be cautious and good luck.

I'm gonna echo JZinCO here. Using statistics on investing is certainly interesting, but it's not very helpful. The main reason is you're using price as your main indicator for analysis, and price has too much embedded within it to be useful. There are many (Nobel Prize winning) mathematicians who have gone down gloriously in flames if you study a bit of history (LTCM is one prominent example). Things like investor psychology, industry shifts (horse and buggy charts looked great until they didn't), and innovation doesn't get reflected in the price until it's too late

If investing could be solved by statistics and math, the best investors would all be mathematicians instead of businessmen.

rachael talcott · « **Reply #33 on:** May 03, 2016, 06:57:05 AM »

Quote from: Northman on May 03, 2016, 12:54:44 AM

Interesting question. I know there is a German company called StarCapital that did some predictions based on CAPE and PB.

CAPE and PB based long term stock market expectations:
http://www.starcapital.de/research/CAPE_Stock_Market_Expectations

Here is a more detailed PDF:
http://www.starcapital.de/files/publikationen/Research_2016-01_Predicting_Stock_Market_Returns_Shiller_CAPE_Keimling.pdf

Thanks.

News:

Author Topic: Forecasting 10-year stock market returns (Read 27817 times)