Big Data Finance T8 2 CHOI NEOMA Ch7 2024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Chapter 7: Big Data Storage

Topic 8B

Hyung-Eun Choi

Assistant Professor of Finance, NEOMA Business School

Big Data for Finance, Spring 2024

1 / 10
7.1 Introduction

Big data Storage is:


Big data storage is concerned with storing and managing data
in a scalable way, satisfying the needs of applications that
require access to the data.
The ideal big data storage system would:
allow storage of a virtually unlimited amount of data.
cope both with high rates of random write and read access.
flexibly and efficiently deal with a range of different data
models.
support both structured and unstructured data.
only work on encrypted data for privacy reasons.

2 / 10
7.1 Introduction (Cont.)

Big data storage technologies are:


Storage technologies that in some way specifically address the
volume, velocity, or variety challenge and do not fall in the
category of relational database systems.
This does not mean that relational database systems do not
address these challenges, but alternative storage technologies
such as columnar stores and clever combinations of different
storage systems, e.g. using the Hadoop Distributed File System
(HDFS), are often more efficient and less expensive.

3 / 10
7.1 Introduction (Cont.)

RDBMS vs. Non-RDBMS*


RDBMS (Relational Database Management System) is a DBMS
designed specifically for relational databases.
To easily locate and access specific values within the database.
It is relational because the values within each table are related
to each other. Tables may also be related to other tables.
The relational structure makes it possible to run queries across
multiple tables at once.
While relational databases are ideal for storing structured data,
their rigid structure makes it difficult to add new fields and
quickly scale the database.
Examples: Oracle Database, MySQL, Microsoft SQL Server,
and IBM DB2.
*sources: https://techterms.com/

4 / 10
7.1 Introduction (Cont.)

RDBMS vs. Non-RDBMS (Cont.)*


Non-RDBMS, on the other hand, does NOT require a structured
schema that defines each table and the related columns.
This provides a much more flexible approach to storing data
than a relational database.
An unstructured or ”semi-structured” approach that is ideal for
capturing and storing user generated content.
Storing text, images, audio files, videos, click streams, tweets,
etc., in a highly scalable way.
Examples: Apache Hadoop, HBase, IBM Domino, and Oracle
NoSQL.
For instance, NoSQL (Not only SQL) stores and accesses data
using key-values.
NoSQLs are especially common in cloud computing applications
and have become a most popular storage solution for big data.
*sources: https://techterms.com/

5 / 10
7.1 Introduction (Cont.)

How to address the three Vs in the big data storage


Volume by distributed systems
Big data storage systems typically address the volume challenge
by making use of distributed architectures.
New nodes provide computational power and and storage to
address increased storage requirements by scaling out.
New machines can seamlessly be added to a storage cluster and
the storage system takes care of distributing the data between
individual nodes transparently.
Velocity and Variety trade-off
Random write access to a database can be a solution for
variety, but it can slow down query performance considerably if
it needs to provide transactional guarantees.
Graph databases are suitable storage systems to address variety,
but it is difficult to scale out in a fast way.

6 / 10
7.2 Key Insights for Big Data Storage

Potential to Transform Society and Businesses across


Sectors:
Big data storage technologies are a key enabler for advanced
analytics that have the potential to transform society and the
way key business decisions are made.
Novel data storage technologies have the potential to enable
new value-generating analytics in and across various industrial
sectors, i.e., Finance, Energy, and Media, etc.
Lack of Standards Is a Major Barrier:
The history of NoSQL is based on solving specific technological
challenges which lead to a range of different storage
technologies.
The large range of choices coupled with the lack of standards
for querying the data makes it harder to exchange data stores as
it may tie application specific code to a certain storage solution.

7 / 10
7.2 Key Insights for Big Data Storage (Cont.)

Open Scalability Challenges in Graph-Based Data Stores


Processing data based on graph data structures is beneficial in
an increasing amount of applications.
It allows better capture of semantics and complex relationships
with other pieces of information coming from a large variety of
different data sources.
It still remains hard to efficiently distribute graph-based data
structure across computing nodes.
Privacy and Security Is Lagging Behind:
Although there are several projects and solutions that address
privacy and security, the protection of individuals and securing
their data lags behind the technological advances of data
storage systems.
i.e., Facebook Hacking, Equifax Data Breach, etc.

8 / 10
7.3 Social and Economic Impact of Big Data Storage

Every firm is somewhat a data firm now:


Enterprises can now store and analyse more data at a lower cost
while at the same time enhancing their analytical capabilities.
The emergence of a data-driven society and economy with huge
transformational potential.
Health sector: better health services by better integration and
analysis of health-related data
Media sector: the analysis of social media has the potential to
transform journalism by summarizing news created by a large
amount of individuals.
what about the financial sector?

The move towards a data-driven economy:


Many sectors are heavily impacted by the maturity and
cost-effectiveness of technologies that are able to handle big
datasets.
Open data initiatives: the EU, the U.S., etc.
Technology vendors for Hadoop Ecosystem (Cloudera)

9 / 10
7.4. Big Data Storage State-of-the-Art
7.4.1.1 NoSQL Databases
NoSQL databases are designed for scalability, often by
sacrificing consistency.
Compared to relational databases, they often use low-level,
non-standardized query interfaces, which make them more
difficult to integrate in existing applications that expect an SQL
interface.
NoSQL database models:
Key-Value Stores: Key-value stores allow storage of data in a
schema-less way, i.e., Hadoop.
Columnar Stores: stores data tables as sections of columns of
data rather than as rows of data, like most relational DBMSs.
Document Databases: Semi-structured document, i.e., XML
or JSON
Graph Databases: stores data in graph structures making
them suitable for storing highly associative data such as social
network graphs, i.e., Neo4J
10 / 10

You might also like