Put very simply, the value of backtesting is determined by how it is done. There are many variables that affect the validity of a backtest. Also, keep in mind that a goal of backtesting is to compare variations of the trading rules applied to the test, to see what impact they have on the profitability of the test. The aim should not be to simulate the market. Backtesting should only be used to establish that a trading strategy has some merit, i.e., is it profitable or not? In my view, actual simulation is not possible—at least not accurately.
The Method of Backtesting
Next, consider how the backtest is conducted. Is the test run by software or a person? There is a huge difference between these two approaches. A machine test has settings for variables but does not include any human factors. Computers do not have emotions. Computers don’t have intuition either. If you consider live trading by a person, it is obvious that human decision-making will be a factor in how the trades are conducted, even if they are paper trades.
Manual Backtesting
A person carrying out backtesting needs to find software that allows a chart to be stepped forward one candle or bar at a time. Then the trading strategy should be applied to the chart, and the tester should not be aware of what is going to happen on the chart. In other words, keep the time period from the buy date going forward hidden and only expose it one day at a time. This means that the tester will not be biased by knowing what is going to happen next.
The results must be documented for each trade, probably using a spreadsheet. Even better is if the tester is not the trader. Try to convince a friend to apply your trading rules to historical charts, but again, don’t let them see the chart ahead of the buy date. Keep the test blind.
Computer Backtesting
Coming back to computer backtesting, here are some of the variables that computer testing uses:
Trade Entry Price: Here, the tester is asked to set the backtest to one of several approaches. One of these is to use the open price the next day to enter a trade. This assumes the trader is using an end-of-day strategy. If the software is set to take the open the next day, how can you be sure that in the real world, the trade entry would have achieved the open price? Was there sufficient volume? Was the trade queued to achieve the open price? If there was drift because the trade entry was delayed, what effect did that have on the trade profitability?
Trade Size: What percentage of the daily volume did your trade occupy? If a stock has a small volume on the buy day, your trade may have been very large by percentage of the available volume. If this was the case, the position would have to be filled in parcels during the day. This will affect the average entry price, but the open price will be used in the test. Some software allows a percentage of daily volume to be set. Some does not. This setting can only know the daily volume after the market closes. Obviously, this is not possible in the real world.
Exit Strategy: This is similar to trade entry price. Does the trade close on the open of the market the next day? Or does it exit when your rule is met intraday? If it’s next day, how do you know this price was achievable?
There are many more rules that backtesting software can be set to use, most of which a human back tester will not apply. Therefore, there can be vast differences between the tested profitability and the real-world trades going forward.
Other Considerations
Another example of this is survivor bias. Backtesting software must be capable of allowing for stocks that were delisted during the test period. Imagine the impact of testing a strategy and ignoring the losses associated with having an open position when the stock delists! This means you need data and software that can track what stocks were available on what dates. If the testing is restricted to, say, the S&P 500, you will need to know what stocks were in the S&P 500 during the period the backtest is set to cover.
Does the Market Repeat? While stock market patterns appear to be similar over time, they are fractal. That is, you will never see the exact same pattern repeat. This means that the influences affecting the market price today are different from those from the past. The implication is that your testing is most accurate if it is conducted on very recent price movements. The older the test, the less relevant it becomes. It is not irrelevant, though. Just be aware that, with respect to its equity curve, the most reliable backtest is the most recent backtest.
Conclusion
Backtesting is important, but there is a lot more to know than this article can cover for now. The aim here is to raise awareness of some of the issues surrounding backtesting.