What is survivor bias and why should we be concerned about it?
One way to test the profitability of a trading strategy is to backtest it. Backtesting is the process of taking historical stock prices and applying a set of trading rules to see if trades that could have been taken in the past would have been profitable. This is a widely used and well accepted means to test a trading idea. However, it comes with certain risks and assumptions that need to be managed if the results are to be at all meaningful.
One of these risks is to overlook survivor bias. Perhaps the best way to illustrate what survivor bias is, is to look at a story from history. During World War II aircraft came back from missions damaged but still flying. Others did not come back at all. Efforts to reinforce or better protect the planes was based on the damage seen in the aircraft that returned. However,
the aircraft that did not come back were omitted from the research – hence the research was biased towards the surviving aircraft. Of course, it was the aircraft that did not come back that held the information needed. Afterall, the damage the missing planes received must have been the reason they went missing assuming no non-battle related mechanical failures or pilot error. So, by focussing on the areas of planes not damaged, the engineers were able to reinforce those areas and losses of aircraft then reduced.
The relevance to trading is that when we backtest stocks over for instance a ten or twenty year period, we need to know that the stocks we are including in our test were the stocks that were tradeable over the test period. This means we need to be sure that delisted stocks are included in our testing. It may sound obvious to say this, but the implication is that within the software we use to complete backtesting, or more accurately, within the stock price data we use, delisted stocks must be included. Be aware that some vendors have been known to simply delete delisted stocks from their database or not include them in the data stream. Moreover, the software or code used for backtesting must be able to detect the delisting date and check that date against any open trades that were being tested at the time of the delisting.
Clearly, if there was an open trade in a stock that delisted, the test would have incurred a loss. 100% of the funds allocated to a delisted stock must appear as a loss in the testing results.
This is survivor bias. It occurs when we test only stocks that have survived for the period of the test without being delisted. Does this make a big difference to test results? Yes, it absolutely does. In fact, backtests without survivor bias built in can produce results several times more profitable than if delisted stocks were included.
Nonetheless, backtesting is widely used to test trading ideas, even though it is prone to a wide spectrum of errors that affect the validity of the results. At best, backtesting is an approximation but it has its uses. Backtesting offers a way to compare the impact of changing trading rules one at a time to assess each change. While the profitability figure obtained from backtesting may not be replicable in the real world, comparison performance across several trading ideas or strategies will show which one is superior.
Computer backtesting, is time saving compared to using pen and paper or a spreadsheet and it allows many trading ideas to be tested and compared very quickly. However, to some extent markets are always evolving and never exactly repeat. Therefore, we see the rise of the saying that past results are no guarantee of future success. At the same time, backtesting is the best tool we have for the purpose at present. The bottom line is that backtest results should be considered in light of the coding experience of the tester and the quality of the software being used.
Back tests can generate spectacular results. As the saying goes, if it looks too good to be true, it probably is. We recommend great caution in expecting to replicate back test results in day-to-day trading. Paper testing of a back tested strategy is highly recommended as a means to verify the results given in a back test.