Skip to content

AI's potential to pass the Chartered Financial Analyst (CFA) exam remains uncertain, according to recent research.

Testing practical finance acumen extensively, the CFA exams possess a notorious reputation for their challenging nature.

AI's ability to pass the CFA exam remains questionable at this point, according to a recent study.
AI's ability to pass the CFA exam remains questionable at this point, according to a recent study.

AI's potential to pass the Chartered Financial Analyst (CFA) exam remains uncertain, according to recent research.

Large language models (LLMs), such as ChatGPT and GPT-4, have shown promising results in performing on mock Certified Financial Analyst (CFA) exams, particularly on Level I and II questions that involve pattern recognition and common finance concepts.

A recent study evaluated 23 diverse LLMs, including GPT-4 variants, on CFA exams. The study used 7 mock exams in total - 5 for Level I and 2 for Level II, with topic distribution mirroring actual CFA exam topic weights. The CFA Program consists of three levels of exams covering topics like financial analysis, portfolio management, accounting, and economics.

Key findings from this study include:

- LLMs perform well on Level I and II CFA questions, often involving pattern recognition and common finance concepts that align well with the models' training data and reasoning capabilities. - On more complex Level III material, especially essay-style and item-set questions requiring deeper integrated reasoning, LLMs show limitations but can still provide technically accurate and relevant responses when carefully guided by scoring rubrics. - The models' success strongly depends on the prompting strategy; advanced prompting improves results significantly by guiding the model through logical chains rather than relying on surface pattern matching alone. - Despite strong performance, LLMs lack true self-awareness or understanding, which means their reasoning is essentially pattern matching within vast training data rather than genuine comprehension.

On the CFA Level II exam, GPT-4 scored 57-61% and ChatGPT 46-48%. Based on estimated CFA passing scores, only GPT-4 could potentially pass the exams under few-shot prompting. However, its Level II performance fell right around the estimated passing boundary.

Longer prompts with case descriptions increased the difficulty of Level II questions, and few-shot prompting did not help as much as Level I for GPT-4 on Level II. Targeted training on finance concepts, formulas, and reasoning techniques could unlock step-wise gains for LLMs on CFA exams and financial reasoning.

Chain-of-thought prompting provided a decent 7% gain over zero-shot performance for GPT-4 on Level II, but ChatGPT saw negligible gains from chain-of-thought prompting on Level II.

The CFA Program exams utilize multiple-choice questions, item sets with longer vignettes, and essay prompts. Level I has 180 independent questions. Level II has 88 questions grouped into 22 item sets. Level III mixes essays and item set questions.

While LLMs show promise in financial reasoning abilities, they do not appear able to reliably pass the CFA exams today. Obtaining the CFA charter involves passing all three exam levels, which typically takes 2-5 years. Each level has pass rates around 40-50%.

The study by researchers at Queen's University, Virginia Tech, and J.P. Morgan tested ChatGPT and GPT-4 on mock CFA Level I and Level II exams. The results highlight LLMs' current limitations in handling the nuanced domain knowledge and reasoning required for the CFA exams. However, gains from few-shot prompting indicate their ability to acquire new finance-specific knowledge.

As LLMs continue to evolve, they may find increased utility as study aids or decision support tools in the finance industry. However, the results underscore the importance of human expertise oversight to ensure accurate and comprehensive financial analysis and decision-making. CFA charterholders often pursue careers in investment analysis, portfolio management, and other finance roles.

  • To further improve the performance of large language models in the financial industry, targeted training on finance concepts, formulas, and reasoning techniques could potentially unlock step-wise gains for them on CFA exams and other similar assessments related to education-and-self-development in finance and investing.
  • Despite their strong potential in pattern recognition and common finance concepts, large language models, such as GPT-4 and ChatGPT, currently lack true self-awareness or understanding in technology and finance. Therefore, it's crucial to have human expertise oversight to ensure accurate and comprehensive financial analysis and decision-making, especially during the education-and-self-development process.

Read also:

    Latest