BDA - M 3 - NoSQL
BDA - M 3 - NoSQL
Agenda
• Introduction to NoSQL
• NoSQL Business Drivers
• NoSQL Data Architecture Patterns: Key-value stores, Graph stores,
Column family Bigtable stores, Document stores,
• Variations of NoSQL architectural patterns,
• NoSQL Case Studies
• NoSQL solution for big data, Understanding the types of big data
problems;
• Analyzing big data with a shared-nothing architecture;
• Choosing distribution models: master-slave versus peer-to-peer;
• NoSQL systems to handle big data problems.
Database Components
• A database: big container wherein all the information about a website, or
an application is stored in a structured format like tables, hierarchy.
• Each table in the database carries a key that makes the data unique from
others.
• Operational Database: The type of database which creates and updates the
database in real-time. It is basically designed for executing and handling the
daily data operations in several businesses. For example, An organization
uses operational databases for managing per day transactions.
• All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL
Server use SQL as their standard database language.
• 1998- Carlo Strozzi use the term NoSQL for his lightweight,
open-source relational database
• 2000- Graph database Neo4j is launched(the only enterprise-
strength graph database)
• 2004- Google BigTable is launched (wide-column and key-
value NoSQL database )
• 2005- CouchDB is launched ( an open-source document-
oriented NoSQL database)
• 2007- The research paper on Amazon Dynamo is
released(key-value NoSQL database)
• 2008- Facebook open sources the Cassandra project (open-
source, distributed, wide-column store, NoSQL database
management system)
• 2009- The term NoSQL was reintroduced
Features of NoSQL
Non-relational
• NoSQL databases never follow the relational model
• Never provide tables with flat fixed-column records
• Work with self-contained aggregates or BLOBs
• Doesn’t require object-relational mapping and data
normalization
• No complex features like query languages, query
planners,referential integrity, joins, ACID
Features of NoSQL
Schema-free
● NoSQL databases are either schema-free or have relaxed schemas
● Do not require any sort of definition of the schema of the data
● Offers heterogeneous structures of data in the same domain
Features of NoSQL
Simple API
• Offers easy to use interfaces for storage and querying data provided
• APIs allow low-level data manipulation & selection methods
• Text-based protocols mostly used with HTTP REST(REpresentational State Transfer)
with JSON
• REST: An architectural style for distributed hypermedia systems
• Web-enabled databases running as internet-facing services
Features of NoSQL
Distributed
• Multiple NoSQL databases can be executed in a distributed fashion
• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and throughput
• Shared Nothing Architecture. This enables less coordination and
higher distribution.
Advantages of NoSQL
• Scale(horizontal): SQL databases are vertically scalable⇒you can increase the load
on a single server by increasing things like RAM, CPU or SSD. NoSQL databases are
horizontally scalable⇒ you handle more traffic by sharding, or adding more servers
• Simple data model (fewer joins)
• Offers a flexible schema design which can easily be altered without downtime or
service disruption
• Handles big data which manages data velocity, variety, volume, and complexity
• Flexible as it can handle semi-structured, unstructured and structured data.
• Cheaper than relational database
• Creates a caching layer: Eliminates the need for a specific caching layer to store
data
• Uses large binary objects for storing large data
• Bulk upload
• Excels at distributed database and multi-data center operations
• Real-time analysis:It can serve as the primary data source for online applications.
• No Single Point of Failure,Easy Replication
• Lower administration: NoSQL databases don’t need a dedicated high-performance
server
Disadvantages of NoSQL
• No standardization rules
2 Not Suited for Hierarchical data storage Best Suited for Hierarchical data storage
9 gives read scalability only gives both read and write scalability.
11 used to handle data coming in low used to handle data coming in high velocity.
velocity.
NoSQL is a type of database which helps to perform operations on big data and
store it in a valid format.
https://youtu.be/zG6CHYCx6ag
Key Value Pair Based
● Data is stored in key/value pairs. It is designed in such a way to handle
lots of data and heavy load.
● Key-value pair storage databases store data as a hash table where
each key is unique, and the value can be a JSON, BLOB(Binary Large
Objects), string, etc.
● A value, which can be basically any piece of data or information, is
stored with a key that identifies its location.
● It is one of the most basic NoSQL database example. This kind of NoSQL
database is used as a collection, dictionaries, associative arrays, etc.
Key value stores help the developer to store schema-less data. They
work best for shopping cart contents.
● Redis, Dynamo, Riak are some NoSQL examples of key-value store
DataBases. They are all based on Amazon’s Dynamo paper.
Benefits of Key-Value Store
• Simplicity. key value databases are quite simple to use. The
straightforward commands and the absence of data types make work
easier for programmers.
• Speed. This simplicity makes key value databases quick to respond,
provided that the rest of the environment around it is well-built and
optimized.
• Scalability. Unlike relational databases, which are only scalable vertically,
key-value stores are also infinitely scalable horizontally.
• Easy to move. The absence of a query language means that the
database can be easily moved between different systems without
having to change the architecture.
• Reliability. Built-in redundancy comes in handy to cover for a lost storage
node where duplicated data comes in place of what's been lost.
When to use a key-value database
1.When your application needs to handle lots of small continuous reads and
writes, that may be volatile. Key-value databases offer fast in-memory
access.
2.When storing basic information, such as customer details; storing web
pages with the URL as the key and the webpage as the value; storing
shopping-cart contents, product categories, e-commerce product details
3.For applications that don’t require frequent updates or need to support
complex queries.
https://youtu.be/fnvsAj1-z2g
• Row Key. Each row has a unique key, which is a unique identifier for that
row.
• Column. Each column contains a name, a value, and timestamp.
• Name. This is the name of the name/value pair.
• Value. This is the value of the name/value pair.
• Timestamp. This provides the date and time that the data was inserted.
This can be used to determine the most recent version of data.
Benefits Column Store Database
• Benefits of Column Store Databases
• Compression. Column stores are very efficient at data compression and/or
partitioning.
• Aggregation queries. Due to their structure, columnar databases perform
particularly well with aggregation queries (such as SUM, COUNT, AVG, etc).
• Scalability. Columnar databases are very scalable. They are well suited to
massively parallel processing (MPP), which involves having data spread across a
large cluster of machines – often thousands of machines.
• Fast to load and query. Columnar stores can be loaded extremely fast. A billion
row table could be loaded within a few seconds. You can start querying and
analysing almost immediately.
https://towardsdatascience.com/8-examples-to-query-a-
nosql-database-fc3dd1c9a8c
Garph-Based Database
• A graph type database stores entities as well the relations amongst
those entities.
• The entity is stored as a node with the relationship as edges.
• An edge gives a relationship between nodes.
• Every node and edge has a unique identifier.
• graphs primarily work on the concept of multi-relational data
‘pathways’.
• But unlike web pages, filesystems have the ability to list all the files in a directory
without having to open the files.
• If the file content is large, it would be inefficient to load all of the files into memory
each time you want a listing of the files.
• To make this easier and more efficient, a key-value store can be modified to include
additional information in the structure of the key to indicate that the key-value pair is
associated with another key-value pair, creating a collection, or general-purpose
structures used to group resources
Grouping Items
• The implementation of a collection system can also vary
dramatically based on what NoSQL data pattern you use.
● Medical imaging systems like CAT scans and MRIs need to convert raw
image data into formats that are useful to doctors and patients.
● For example, the New York Times converted 3.3 million scans of old
newspaper articles into web formats using tools like Amazon EC2 and
Hadoop for a few hundred dollars.
Typical Big Data Use Cases
• Public web page data —
• Publicly accessible pages are full of information that
organizations can use to be more competitive.
• They contain news stories, RSS feeds( Really Simple Syndication) is a web
feed that allows users and applications to access updates to websites), new product
information, product reviews, and blog postings.
• Not all of the information is authentic.
• There are millions of pages of fake product reviews
created by competitors or third parties paid to disparage
other sites.
• Finding out which product reviews are valid is a topic for
careful analysis.
Typical Big Data Use Cases
• Remote sensor data —
• Small, low-power sensors can now track almost any aspect of our
world.
• Road sensors can warn about traffic jams in real time and suggest
alternate routes.
• You can even track the moisture in your garden, lawn, and indoor
plants to suggest a watering plan for your home
Typical Big Data Use Cases
• Event log data —
Computer systems create logs of read-only events from web page hits
(also called clickstreams), email messages sent, or login attempts.
Each of these events can help organizations understand who’s using what
resources and when systems may not be performing according to
specification.
Event log data can be fed into operational intelligence tools to send alerts
to users when key indicators fall out of acceptable ranges.
the peer-to-peer model stores all the information about the cluster on each
node in the cluster. If any node crashes, the other nodes can take over and
processing can continue.
NoSQL systems handle big data problems
• Take a look at four of the most popular ways NoSQL systems handle big
data challenges.
1. Moving queries to the data, not data to the queries
• With the exception of large graph databases, most NoSQL systems use
commodity processors that each hold a subset of the data on their local
shared-nothing drives.
• When a client wants to send a general query to all nodes that hold data,
it’s more efficient to send the query to each node than it is to transfer
large datasets to a central processor.
2. Using hash rings to evenly distribute data on a cluster
One of the most challenging problems with distributed databases is figuring out a
consistent way of assigning a document to a processing node.
Using a hash ring technique to evenly distribute big data loads over many servers
with a randomly generated 40-character key is a good way to evenly distribute a
network load.