Thoughts on AI and Investing in Stocks

Rafael Nicolas Fermin Cota
Apr 7, 2023
6 min read

When Ben Graham listed his screens for finding good investments in 1949, running those screens required data and tools that most investors did not have access to, or the endurance to run. All of the data came from poring over annual reports, often using very different accounting standards, the ratios had to be computed with slide rules or on paper, and the sorting of companies was done by hand. Even into the 1980s, access to data and powerful analytical tools was restricted to professional money managers and thus remained a competitive advantage. But as data has become easier to get, accounting more standardized and analytical tools more accessible, there is very little competitive advantage to computing ratios (PE, PBV, debt ratio etc.) from financial statements and running screens to find cheap securities. In fact, purely picking individual trades to gain an edge is increasingly difficult, given the wider availability of once-valuable financial data and the fact that markets are more efficient than in the past year.

In an industry where almost every player has access to the same core financial and fundamental data, statistical learning, a subset of artificial intelligence (AI), is a natural choice to comb through massive amounts of data and find differentiated insights into companies not found in filings, earnings calls or fundamental datasets. The data used in earnings forecasts today could include satellite imagery, real-time spending information, meta-data details of every product on every store, and much more besides. This dramatic increase in the availability of data will profoundly change the investment landscape. As more investors adopt a more data driven investment style, the market will start reacting faster and will increasingly anticipate traditional data sources (e.g. quarterly corporate earnings, low frequency macroeconomic data, etc.). And the complexity of delivering information in a comprehensible, actionable format (https://rpubs.com/rafael_nicolas/tour_world_economies_businesses) means that, for the foreseeable future, AI on its own will not be able to make accurate investment decisions. As such, it will not usurp the integral role of humans in curating data and identifying investment signals. The key here is an agile team of domain experts in what makes a market function. Do we know why these stocks go up, or down, in price? Therefore, domain knowledge is best useful in data engineering and using those features as inputs. For example, we leverage statistical learning with text in combination with data engineering (https://github.com/rnfermincota/academic/blob/main/teaching/NUS/Statistical-Learning/4-Portfolio-Mgmt/4-winning-submission.pdf) in an attempt to find catalysts that can cause the gap between price and value to change. Some of these catalysts can take different forms:

In the earnings reports, in addition to the proverbial bottom line (earnings per share), companies provide information about operating details (growth, margins, capital invested). To the extent that the pricing reflects unrealistic expectations about the future, information that highlights this in an earnings report may cause investors to reassess price.
News stories about a company's plans to expand, acquire or divest businesses or to update or introduce new products can reset the pricing game and change the gap.
A change in the ranks of top management or a managerial misjudgment that is made public can cause investors to hit the pause button, and this is especially true for companies that are bound to a single personality (usually a powerful founder/CEO) or derive their value from a key person.
A change in the macro environment or the regulatory overlay for a company can also cause a reassessment of the gap.

It should also be noted that with these catalysts - whether it be interest rates, economic news or corporate profits - when the gap changes, it may not always close. It can sometime make a gap bigger because the cash flows, growth and risk. COVID, for example, did a great job at revealing the importance of balance between reliance on signals discovered from statistical learning and human judgement (the anticipation of the shifts in paradigms). Unfortunately, investment models, if not properly guided, can overfit or uncover spurious relationships and patterns. Big data and AI cannot entirely replace human judgment in investing. Knowing what works on average still requires humans to assess whether the average applies. But I also believe that analysts, traders and portfolio managers will eventually be replaced by someone that knows how to use statistical learning to better perform fundamental and statistical analysis and is true across asset classes.

The transition to a more quantitative, data driven investment style will not be without setbacks. Employing data scientists who lack specific economic intuition and experience in working with quantitative methods may not lead to the desired investment results. Many machine learning and AI concepts may sound plausible but will not lead to viable investment strategies. It is much easier for an experienced quantitative investor to design a viable investment strategy than for a data scientist. It is more important to understand the factors driving returns, than to be able to develop complex technological solutions (e.g., models and architecture) that do not justify marginal performance improvements. I have come across data scientists misrepresenting their skills and abilities, and managers unable to distinguish soft skills (such as ability to talk about statistical/machine learning) and hard skills (such as ability to design investment strategies). This has led to culture clashes, and lack of progress as measured by PnL generated from Artificially Inflated (NOT Artificial Intelligence!).

Data represent another source of risk. Access to data is not information, and not all data is actionable. Data that is based on what we do now is worth a lot more than what we say will do in the future. For data to have value in investing, we need to have some degree of exclusivity in access to that data or a proprietary edge on processing that data. The true competitive advantage can be harnessed only when we have some form of knowledge about the environment in which the data is generated and processed. If everyone has knowledge about the environment in which the data is generated and processed, no one has it. This is a point worth remembering as it is one of the reasons that fundamental managers have been unable, for the most part, to convert increased access to alternative data into investing profits. Further, certain types of data may lead into blind alleys - datasets that do not contain alpha, signals that have too little investment capacity, decay quickly, or are simply too expensive to purchase relative to their benefit. What we discovered was that the expenses associated with analyzing certain data went far beyond the cost of the data itself. In fact, the cost of premium priced financial data feeds is typically the least expensive aspect of obtaining usable information. Firms can end up spending up to 10 times more on data integration than they do on data feeds given the number of steps required to prepare the data for use. Given the risks and uncertain rewards, many asset managers are still wondering how far & fast to go with changes when adopting a more quantitative, data driven investment framework. While many asset managers set data strategies as their top priorities, only a fraction of them are completely satisfied with their current data infrastructure.

Connecting data can be messy, costly, time-consuming, and error prone. To highlight the difficulties that fund managers can face in using financial data, the Linkedin post https://www.linkedin.com/posts/rnfc_financial-data-quality-activity-6706735330785153024-ugMT/ provides an overview of data limitations on financial data provided by two of the most reputable data vendors, Bloomberg Professional Services and S&P Global Market Intelligence. By working with major data vendors for over a decade, we were able to build the necessary data infrastructure, 162 Grid, for an investment management operation at very low incremental cost utilizing the analytical expertise and software coding skills we have developed in-house. 162 Grid supplies real-time data cleaning, verification, and quality control of data to improve the decision making of any fund manager (https://www.linkedin.com/m/pulse/162-grid-rafael-nicolas-fermin-cota-1e/). And finally, we built a standardized research process, in which all investment hypotheses are vigorously tested to be validated.

As a logical individual who has spent the past decade working with quantitative investing techniques and risk premia investing, I gained the right mix of soft skills (such as ability to teach data science) and hard skills (such as the ability to design a systematic strategy which produces robust weights) that allowed us to fully internalize and integrate data science and alternative data. Securing talent in quantity and quality was the key to research workflow innovation, and a competitive advantage. To encourage a good flow of talent over the long term, I created education platform across all teams: data engineering, quantamental research and portfolio construction & execution. This ranges from sharing internal materials that stimulate interest in programming, data engineering, statistical learning and computational finance. Luckily, I taught two data science courses (https://github.com/rnfermincota/academic/tree/main/teaching/NUS) for over a decade with a focus on fundamental and quantitative investment styles, consumption of increasing amounts and differentiated types of data, and adoption of new methods of analysis such as those based on machine learning.

P.S. Just over a decade ago, my job was to build Excel/VBA based algorithms (https://github.com/rnfermincota/academic/tree/main/teaching/Ivey/8.%20Nico-Add-Ins/backup) that would allow money managers to quickly collect data from a growing variety of online sources including SEC Filings, while helping them make smarter better informed investment decisions before the rest of the market caught on. On a daily basis, I spent significant time and effort transforming raw data (https://github.com/rnfermincota/academic/tree/main/research/traditional_assets/database) into a representation useful for statistical modeling.

Thoughts on AI and Investing in Stocks

Recent Posts

Comments