The Problem with Forecasting Time Series Data in Open Systems

Building forecasting models from time series data is one of the most common tasks any Data (Insert your title here) is faced with. Our inherent need to predict exact occurrences overshadows our reasoning most of the time.

In our effort to predict future occurrences based upon time series data, we have come up with all sorts of techniques. Here is just a partial list.

We have yet to even mention the almost 400 plus technical indicators used to create chart art for the practitioners who “predict” stock movement.

ALL of these methods have a distinct problem, a fallacy which renders their predictive powers mute. Do you doubt me? Then just show me the method which will work often enough for everyone who uses it to become rich in the stock market! What’s that? You can’t! Well, then I guess I proved my point!

So why do all these methods fail so miserably when used on real world data. It is because none of these methods or techniques can ever account for all of the significant variables in an open system.

They work in laboratory settings. They work in closed systems, where all of the variables are known. But let these predictive models wonder free in the real world and the models fall apart.

As an example, let’s predict what temperature we should set our thermostat too dependent upon the weather outside and the number of people inside of a building using RNN or ARIMA. Will it work? Actually yes, and fairly well I might add. The weather outside can only be so cold or so hot. The past data collected on how the HVAC unit responds given the building load (number of people) is well known. This is a perfect application for a time series predictive model. For all intents and purposes, we are working in a closed system. Almost every variable can be accounted for.

Move to the markets, or to some other type of business forecasting…even the weather itself…and all bets are off. Here is a true story which illustrates the point beautifully.

There was an old man who ran a business that traded large amounts of AG futures on the CBOT. A young computer guy walked into his office proclaiming he had a new software which could help predict the AG markets via charts. The old man being willing to teach others, instructed the young man to proceed with his presentation and show him an example. The young man eagerly pulled up a chart of soybeans and exclaimed “You see, here is a resistance level in soybeans developing right now, Soybeans will not fall below this price!”

The old man smiled and said, “So because your chart is showing you this resistance level, you are certain that it won’t drop below that price?”

The young man exclaimed “Yes!”

Just then, the old man went over to his desk, picked up his phone and called down to the floor of the CBOT. “I want you to sell 10,000 contracts of front month beans!”

All of the sudden, the market got flooded with beans and the price kept dropping, right through the young man’s resistance level in real time. Then the old man exclaimed, “Why couldn’t your chart see that coming?”

The Search for Correlation

Most financial market analysis is built upon the idea that statistical correlations between different markets exist.

Diversified portfolios are supposed to hedge against risk based upon individuals investing in uncorrelated products. Here is the problem, correlations are always changing because of other variables, and they are difficult to discover. For instance, during the next black swan event, notice that both soybeans and MSFT stock will crash.

“Why? Soybeans are not correlated with the price of Microsoft?”

Well, that depends! If people start needing to liquidate good positions in order to avoid margin calls, one might find a great deal of correlated products. Diversity does not protect your portfolio.

Not only do correlations change, but they are difficult to find. Have you ever considered how the weather in Manhattan might influence the mood of a wall street trader? Shouldn’t that idea be added into the calculus? There are too many factors to know and all of them are dynamic. Thus, the future cannot be known.

Conclusion

Utilizing time series predictive techniques in open systems is a complete waste of time. I will go so far as to make the following claim: The use of models which provide the user with just probabilities of occurrence in open systems is greatly skewed, and thus flawed, due to the inability to account for significant variables in the system.

So the next time someone shows you a Linear Regression model, ask yourself “What type of system was this data pulled from?” If it was an open system, take the sheet of paper and throw it in the trash.

The sad part is, if most businesses actually realized the truth behind this article, entire departments would be wiped out!

Jonathan Adams

August 6th, 2025