A Primer On Using Data + Ai For Fraud Prevention: Ebook
A Primer On Using Data + Ai For Fraud Prevention: Ebook
A Primer On Using Data + Ai For Fraud Prevention: Ebook
A Primer on
Using Data + AI
for Fraud Prevention
Abstract
A Primer on Using Data + AI for Fraud Prevention
Fraud is a costly and growing problem. For every $1 of fraud, companies pay 3.36x in chargebacks and
replacement and operational costs. Identity fraud losses soared to $56 billion in 2020. Fraudsters continue to
increase the scale, speed and sophistication of attacks — from 2019 to 2020, fraud grew at an average rate of
33% — threatening the revenue and growth of companies. This eBook explains how data and machine learning
are ideal applications for fraud prevention — especially in the face of modern threats — and how to future-
proof your organization using the Databricks Lakehouse Platform. It also explains how financial services leaders
are using data and AI to combat fraud and includes two Databricks Solution Accelerators with easy-to-use
and best-practice notebooks so you can get a jump start on fraud detection.
8. Conclusion 19
• $150 billion lost by government agencies to a wide range of fraud, representing 10% of all fraud4
Fraud continues to grow at 33% year over year, threatening the revenue and growth of companies. It impacts
customer satisfaction, loyalty and the bottom line.5
Moreover, fraudsters’ methods are becoming more technologically advanced. The Association of Certified
Fraud Examiners (ACFE) and PwC have shared how organized crime and large-scale fraudsters are increasing
Fraud continues to grow
at 33% year over year, the scale, speed and sophistication of fraud attacks.6 One of the newest methods involves using machine
threatening the revenue learning (ML) and other automation techniques to commit fraud that legacy approaches can’t detect, including
and growth of companies phishing emails, identity theft and forgery, phone and location spoofing, and the emulation of user behavior.
Experts and industry leaders are now looking to machine learning and AI to understand and get ahead of fraud.
Fraudsters are using AI. Why aren’t you?
1
CNP Fraud Costs U.S. Merchants $3.36 for Every $1 of Direct Fraud Loss
2
U.S. sentences 14 to combined 74 years in prison for healthcare fraud
3
Refinitiv Survey Report: Revealing the true cost of financial crime
4
McKinsey on Government Perspectives: Adopting AI, automation, and advanced analytics in governments
5
Fraud rate rises 33% during COVID-19 lockdown
6
ACFE and ABFA Fraud Resources
With the availability of data and with advances in machine learning, fraud prevention is a key area in
which machine learning is changing both workflows and outcomes, allowing organizations to stay ahead of
criminals who are only growing more technologically advanced. Today’s businesses are facing an increasingly
sophisticated enemy that attacks, responds and changes tactics extremely quickly. With data analytics and
machine learning, companies can get ahead of threats. Below, we discuss the main reasons why machine
Fraudsters constantly learning is especially well suited for taking on fraud.
adapt their tactics,
making them difficult to Fraud hides under massive amounts of data
detect by humans
The most effective way to detect fraud is to examine the overall behaviors of end users. Looking at transactions
or orders is not enough — we need to follow the events leading up to and after the transaction. This culminates
in a lot of structured and unstructured data, and the best way to detect fraud in such huge volumes of data is
with machine learning and AI.
Machine learning uses statistical models to look at past outcomes and anomalies to predict future
outcomes. A machine learning system can learn, predict and make decisions as data arrives in real time.
• Less need for manual review: Machine learning automates processes in which behaviors can be
learned at the individual level and anomalies can be detected
• Ability to prevent fraud without impeding the user experience: AI brings automation to the
Coupled with human talent process seamlessly and prevents fraud in advance without burdening users
and experience, data and AI
work together to constantly • Lower operational costs than other approaches: With less manual work and more automation,
learn and adjust to new user data and AI require fewer resources and preempt losses associated with fraud
behaviors and trends
• Frees up time for teams to focus on more strategic tasks: Most companies are not in the
business of fraud detection, and a machine learning fraud prevention process can help them
focus on core activities
• Adapts quickly: Coupled with human talent and experience, data and AI work together to
constantly learn and adjust to new user behaviors and trends
When it comes to operationalizing data and AI to build customer relationships and drive higher returns
on equity, fraud should be considered a top priority. Curbing fraudulent or malicious behavior — from
fraudulent securities trading to money laundering — is key to mitigating negative revenue impact.
• Getting high-quality, clean data and maintaining a rich feature store for ever-evolving fraud patterns
Companies that want
to build their own ML • Using multiple vendors that are siloed in capabilities — for example, ETL, data science and scaling on
infrastructure need to demand
think about supporting
massive data growth • Enabling multiple data teams to scale their data pipelines and collaborate in the cloud
• Meeting ML challenges by training complex models with hundreds of features on gigabytes of data
Amazon Redshift Teradata Hadoop Apache Airflow Apache Kafka Apache Spark Jupyter Amazon SageMaker
Azure Synapse Google BigQuery Amazon EMR Apache Spark Apache Flink Amazon Kinesis Azure ML Studio MATLAB
Snowflake IBM Db2 Google Dataproc Cloudera Azure Stream Google Dataflow Domino Data Lab SAS
Analytics
SAP Oracle Autonomous Confluent TensorFlow PyTorch
Data Warehouse Tibco Spotfire
Data Machine
science learning
A lakehouse architecture is well suited for fraud prevention because it provides financial services companies
with a fully managed cloud platform that accelerates innovation by unifying data engineering, data analysis and
data science with the rest of the business. And a lakehouse environment can easily extract and manage the
massive, real-time, changing and differing data types that are needed to detect fraud.
A lakehouse architecture
is well suited for fraud
prevention because it Late 1980s 2011 2020
provides a fully managed
cloud platform that unifies D ATA WA R E H O U S E D ATA L A K E LAKEHOUSE
data engineering, data
analysis and data science
with the rest of BI Reports Data ML Real-Time Reports BI Streaming Analytics Data Science
Science Database
the business
BI ML
Data Marts ETL Data Marts
Data Prep
and Validation
ETL
0 1 0 1 01 10 1 0 0 1 0 11 01 0 0 1 0 1 00 10 1 0
Data Lake
External Data Operational Data Structured, Semi-Structured, and Unstructured Data Structured, Semi-Structured, and Unstructured Data
SOURCE: DATABRICKS
The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses — delivering the data
management and performance typically found in data warehouses and the low-cost, flexible object stores offered by data lakes
Databricks enables organizations to overcome these challenges with a lakehouse architecture powered
by Delta Lake. An open source technology, Delta Lake is natively integrated within Databricks to provide
reliability and performance. It acts as a storage layer on top of your data lake that enforces data quality
with ACID transactions. FSIs can ingest structured, semi-structured and unstructured data, both in batch
and streaming, into a single Delta Lake to ensure that the supply of data is clean and usable. The scalability
of Databricks enables organizations to then process and query this data for near real-time insights.
The ability for users to collaborate across multiple workspaces while providing isolation at the user level
is critical in financial services. Databricks’ fraud solutions address the key areas of scalability in the cloud,
fraud prevention workflow management and production-grade open source ML frameworks. They allow
organizations to build and maintain a modern fraud and financial crimes management infrastructure by
increasing alignment between different internal teams.
The Databricks Lakehouse Platform unifies data teams so they can collaborate across the entire data and AI workflow
Additionally, any AI-based model must comply with strict regulatory requirements, necessitating as-of
code, data and models. By combining MLflow and Delta Lake capabilities, models and rules can be created
with a high degree of governance and trust in your data. An independent team of experts can ensure the
right parameters are used against the right set of data, resulting in the right outcome.
All your financial services data Reliable, real-time processing Analytics capabilities for every use case
DATA INGEST: Processing batch and streaming Connect traditional data with alternative data insights
data can be slow and error-prone, impacting
downstream analytics
DATA LAKE MANAGEMENT: Data silos Easily handle large volumes of data from multiple sources
can limit the ability to gain a complete view of (transactions, geospatial, demographics, etc.) built on a strong
the customer privacy foundation
DATA QUERY: Fragmented, siloed and Ability to rapidly and inexpensively experiment, manage and
inconsistent data sources for BI and data science push out at scale from a single platform
Both accelerators demonstrate the innovative use of anomaly detection with geospatial analysis in fraud
prevention, and show how easy it is to get started.
“
members who break the rules.
• Provides a fully managed platform that removes infrastructure complexity so they can focus
on the data rather than DevOps
With Databricks, teams can quickly iterate on ML models and scale detection efforts to hundreds of billions of
market events per day. As a result, FINRA has significantly improved fraud prevention, leading to a more secure
financial future for investors in the U.S.
Impact
• Analyze 100 billion stock market events per day to identify fraud and wrongdoing
7
As a key digital payments platform in the Philippines, Coins.ph needs to be able to perform accurate, insightful
financial audits and prevent fraud — in real time. With more than 10 million customers accessing digital payment
services for local and international remittances, bill payments and online shopping, Coins.ph needed to find a
way to move beyond development operational processes and address more advanced business challenges.
With Databricks, Coins.ph was able to harness richer data insights to deliver ML-powered fraud detection and
“
anti-money-laundering solutions at greater speed while optimizing financial reconciliation.
Why Databricks?
• Provides a unified platform for data teams to collaborate on data preparation and analytics and to
prototype new models
• Delta Lake ensures consistent data pipelines that feed data downstream for ML
Using Delta Lake to ingest large volumes of data in real time, data engineers at Coins.ph are able to bring greater
reliability to the data lakes and get up-to-date insights to develop more robust and scalable data pipelines for
70%
experiments and quickly develop new prototypes to address fraud detection.
Impact
• 14x reduction in complaints received
© Databricks 2021. All rights reserved. Apache, Apache Spark, Spark and the Spark logo are trademarks of the Apache Software Foundation. Privacy Policy | Terms of Use