LM11 Introduction To Big Data Techniques IFT Notes
LM11 Introduction To Big Data Techniques IFT Notes
LM11 Introduction To Big Data Techniques IFT Notes
1. Introduction ........................................................................................................................................................ 2
2. How Is Fintech Used in Quantitative Investment Analysis? ............................................................ 2
3. Advanced Analytical Tools: Artificial Intelligence and Machine Learning ................................. 4
4. Tackling Big Data with Data Science ......................................................................................................... 5
Summary................................................................................................................................................................... 7
This document should be read in conjunction with the corresponding reading in the 2023 Level I CFA®
Program curriculum. Some of the graphs, charts, tables, examples, and figures are copyright
2022, CFA Institute. Reproduced and republished with permission from CFA Institute. All rights
reserved.
Required disclaimer: CFA Institute does not endorse, promote, or warrant the accuracy or quality of
the products or services offered by IFT. CFA Institute, CFA®, and Chartered Financial Analyst® are
trademarks owned by CFA Institute.
Ver 1.0
1. Introduction
This learning module covers:
• What is ‘Fintech’ and how it is used in investment analysis
• A brief explanation of Big Data, artificial intelligence, and machine learning
• Applications of Big Data and Data Science to investment management
2. How Is Fintech Used in Quantitative Investment Analysis?
The term ‘Fintech’ comes from combining ‘Finance’ and ‘Technology’. Fintech refers to
technological innovation in the design and delivery of financial products and services.
Though the term ‘Fintech’ is relatively new, its earlier forms involved data processing and
automation of routine tasks. Fintech later advanced into decision-making applications based
on complex machine learning logic.
The major drivers of fintech have been:
• Rapid growth in data
• Technological advances
While Fintech spans the entire finance space, this learning module focuses on fintech
applications that are more directly relevant to quantitative analysis in the investment
industry:
• Analysis of large datasets
• Analytical tools
Big Data
Big Data refers to vast amount of data generated by industry, governments, individuals, and
electronic devices.
Characteristics of big data typically include:
• Volume: Over the last few decades, the amount of data that we are dealing with has
grown exponentially.
• Velocity: It refers to the speed at which data are communicated. In the past we often
worked with batch processing; however, we are now increasingly working with real
time data.
• Variety: Historically we only dealt with structured data. However, we are now also
dealing with unstructured data such as text, audio, video, etc.
In addition to these three V’s, a fourth V is becoming increasingly important, especially when
using big data for drawing inferences or making predictions.
• Veracity – refers to the credibility and reliability of different data sources.
Big Data can be structured (can be organized in tables), semi-structured, or unstructured
(cannot be represented in a tabular form).
Sources of Big Data
Traditional data sources include annual reports, regulatory filings, trade price and volume,
etc. Alternate data include many other sources and types of data. A simple classification of
alternate data sources is shown in Exhibit 2 of the curriculum.
Individuals Business Processes Sensors
Social media Transaction data Satellites
News, reviews Corporate data Geolocation
Web searches, personal data Internet of Things
Other sensors
far from the truth. For ML to work well, good human judgment is required. Human
judgment is required for questions like: which data to use, how much data to use,
which analytical techniques are relevant in the given context. Human judgment may
also be needed to clean and filter the data before it is fed to the ML algorithm. Deep
learning algorithms are used for image, pattern, and speech recognition.
Some challenges associated with machine learning are:
• Over-fitting the data: Sometimes an algorithm may try to be too precise in the way it
interprets data and predicts outcomes. This leads to over-trained models and may
result in data mining bias. We try to mitigate this issue by having a good validation
dataset.
• Black box: ML techniques can be opaque or black box, which means we have
predictions that are not very easy to understand or to explain.
Despite these challenges and weaknesses, the importance of ML in finance and investment
management has been growing substantially.
An example of a heat map is a map of a city where routes with high traffic congestion are
shown in red. A tag cloud is a technique applicable to textual data. Words that appear more
often are shown in a larger font, whereas words that appear less often are shown with a
smaller font. This helps us to quickly evaluate how consumers/users are talking about a
given product.
Exhibit 3 from the curriculum shows an example of a ‘tag cloud’.
Summary
LO: Describe aspects of “fintech” that are directly relevant for the gathering and
analyzing of financial data.
Fintech refers to the technological innovation in the design and delivery of financial products
and services.
LO: Describe Big Data, artificial intelligence, and machine learning.
Big Data refers to vast amounts of data generated by industry, governments, individuals, and
electronic devices.
Artificial intelligence (AI) computer systems perform tasks that have traditionally required
human intelligence. They exhibit cognitive and decision-making ability comparable or
superior to that of human beings.
Machine learning (ML) refers to computer-based techniques that “extract knowledge from
large amounts of data by “learning” from known examples and then generating structure or
predictions” without relying on any help from a human. In ML, the dataset is divided into
three distinct subsets, training dataset, validation dataset, and test dataset. There are three
main approaches to machine learning, i.e., supervised learning, unsupervised learning, and
deep learning.
LO: Describe applications of Big Data and Data Science to investment management.
Text analytics refers to the use of computer programs to derive meaning from large,
unstructured text- or voice-based data.
Natural language processing (NLP) is an application of text analytics whereby computers
analyze and interpret human language.