0% found this document useful (0 votes)
4 views50 pages

Database-Technology Removed (2)

Uploaded by

ganeshkr967
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
4 views50 pages

Database-Technology Removed (2)

Uploaded by

ganeshkr967
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 50

Document Databases (Mongodb)

• Documents are the main concept in document


databases.

• The database stores and retrieves documents, which


can be JSON, BSON, and so on. These documents
are self-describing, hierarchical tree data
structures which can consist of maps, collections,
and scalar values.

• The documents stored are similar to each other but


do not have to be exactly the same.

• Document databases store documents in the value


part of the key-value store.
How terminology compares in Oracle & MongoDB

The _id is a special field that is found on all documents in Mongo, just like
ROWID in Oracle. In MongoDB, _id can be assigned by the user, as long
as it is unique.
Every object within the MongoDB database contains this unique identifier
_id to distinguish that object from every other object. It is added
automatically to every document you create in a collection.
Figure: The MongoDB database model
Figure: A typical relational database model
What Is a Document Database?

{
"firstname": "Martin",
"likes": [ "Biking", "Photography" ],
"lastcity": "Boston"
}
The above document can be considered a row in
a traditional RDBMS.
Let’s look at another document
{
"firstname": "Pramod",
"citiesvisited": [ "Chicago", "London", "Pune", "Bangalore" ],
"addresses":
[
{ "state": "AK",
"city": "DILLINGHAM",
"type": "R"
},
{ "state": "MH",
"city": "PUNE",
"type": "R"
}
],
"lastcity": "Chicago"
}
• Looking at the documents, we can see that they are
similar, but have differences in attribute names.
This is allowed in document databases.
• The schema of the data can differ across
documents, but these documents can still belong to
the same collection—unlike an RDBMS where
every row in a table has to follow the same schema.
• We represent a list of citiesvisited as an array, or a
list of addresses as list of documents embedded
inside the main document.
• Embedding child documents as subobjects inside
documents provides for easy access and better
performance.
• If you look at the documents, you will see that some of
the attributes are similar, such as firstname or city.
At the same time, there are attributes in the second
document which do not exist in the first document,
such as addresses, while likes is in the first document
but not the second.

• This different representation of data is not the same as


in RDBMS where every column has to be defined,
and if it does not have data it is marked as empty or
set to null.

• In documents, there are no empty attributes; if a


given attribute is not found, we assume that it was not
set or not relevant to the document. Documents
allow for new attributes to be created without the need
to define them or to change the existing documents.
Some of the popular document databases

• MongoDB
• CouchDB
• Terrastore
• OrientDB
• RavenDB
• Lotus Notes
MongoDB Features

• While there are many specialized document


databases, we will use MongoDB as a
representative of the feature set.

• Keep in mind that each product has some


features that may not be found in other
document databases.
• Let’s take some time to understand how MongoDB
works. Each MongoDB instance has multiple
databases, and each database can have multiple
collections.
• When we compare this with RDBMS, an RDBMS
instance is the same as MongoDB instance, the
schemas in RDBMS are similar to MongoDB
databases, and the RDBMS tables are collections in
MongoDB.
• When we store a document, we have to choose
which database and collection this document belongs
in—for example,
• database.collection.insert(document), which is
usually represented as
• db.collection.insert(document).
MongoDB
• MongoDB is a cross-platform, document oriented
database that provides, high performance, high
availability, and easy scalability. MongoDB works on
concept of collection and document.

Database
• Database is a physical container for collections. A
single MongoDB server typically has multiple databases.
Collection
• Collection is a group of MongoDB documents. It is
the equivalent of an RDBMS table. A collection
exists within a single database. Collections do not
enforce a schema. Documents within a collection
can have different fields. Typically, all documents
in a collection are of similar or related purpose.

Document
• A document is a set of key-value pairs. Documents
have dynamic schema. Dynamic schema means that
documents in the same collection do not need to
have the same set of fields or structure, and
common fields in a collection's documents may hold
different types of data.
MongoDB Commands
• Command to Start MongoDB
sudo service mongodb start

• Command to Stop MongoDB


sudo service mongodb stop

• Command to Restart MongoDB


sudo service mongodb restart

• Command to use MongoDB


mongo
MongoDB - Create Database
• use Command
MongoDB use DATABASE_NAME is used to create database. The command
will create a new database if it doesn't exist, otherwise it will return the existing
database.
– Syntax
Basic syntax of use DATABASE statement is as follows −
use DATABASE_NAME
– Example
If you want to use a database with name <mydb>, then use DATABASE
statement would be as follows −
>use mydb
It displays:
switched to db mydb

• db Command
To check your currently selected database, use the command db
>db
It displays:
mydb
• show dbs Command
If you want to check your databases list, use the command show
dbs.
>show dbs
It displays:
local 0.78125GB
test 0.23012GB
Your created database (mydb) is not present in list. To display
database, you need to insert at least one document into it.

• insert command
>db.movie.insert({"name":"tutorials point"})
>show dbs
local 0.78125GB
mydb 0.23012GB
test 0.23012GB
In MongoDB default database is test. If you didn't create any
database, then collections will be stored in test database.
MongoDB - Drop Database

dropDatabase() Method
MongoDB db.dropDatabase() command is used to drop a existing database.

Syntax
Basic syntax of dropDatabase() command is as follows −

>db.dropDatabase()
This will delete the selected database. If you have not selected any database, then
it will delete default 'test' database.

Example
First, check the list of available databases by using the command, show dbs.

>show dbs
local 0.78125GB
mydb 0.23012GB
test 0.23012GB
>
If you want to delete new database <mydb>, then
dropDatabase() command would be as follows −

>use mydb
switched to db mydb
>db.dropDatabase()
>{ "dropped" : "mydb", "ok" : 1 }
>
Now check list of databases.

>show dbs
local 0.78125GB
test 0.23012GB
>
MongoDB - Create Collection

createCollection() Method
MongoDB db.createCollection(name, options) is used to create collection.

Syntax
Basic syntax of createCollection() command is as follows −

>db.createCollection(name, options)
In the command, name is name of collection to be created. Options is a
document and is used to specify configuration of collection.

Parameter Type Description

Name String Name of the collection to be created

Options Document (Optional) Specify options about


memory size and indexing
Options parameter is optional, so you need to specify only the name of the
collection. Following is the list of options you can use −
Field Type Description

(Optional) If true, enables a capped collection.


Capped collection is a fixed size collection that
capped Boolean automatically overwrites its oldest entries when it
reaches its maximum size. If you specify true, you
need to specify size parameter also.
(Optional) If true, automatically create index on _id
autoIndexId Boolean
field. Default value is false.
(Optional) Specifies a maximum size in bytes for a
size number capped collection. If capped is true, then you need
to specify this field also.
(Optional) Specifies the maximum number of
max number
documents allowed in the capped collection.
While inserting the document, MongoDB first checks size field of capped
collection, then it checks max field.

Syntax of createCollection() method without options is as follows −

>use test
switched to db test
>db.createCollection("mycollection")
{ "ok" : 1 }
>

You can check the created collection by using the command show collections.

>show collections
mycollection
system.indexes
The following example shows the syntax of createCollection() method
with few important options −

>db.createCollection("mycol", { capped : true, autoIndexId : true, size :


6142800, max : 10000 } )
{ "ok" : 1 }
>
In MongoDB, you don't need to create collection. MongoDB creates
collection automatically, when you insert some document.

>db.tutorialspoint.insert({"name" : "tutorialspoint"})
>show collections
mycol
mycollection
system.indexes
tutorialspoint
>
MongoDB - Drop Collection
drop() Method
MongoDB's db.collection.drop() is used to drop a collection from the database.

Syntax
Basic syntax of drop() command is as follows −
db.COLLECTION_NAME.drop()

Example
First, check the available collections into your database mydb.

>use mydb
switched to db mydb

>show collections
mycol
mycollection
system.indexes
tutorialspoint
>
Now drop the collection with the name mycollection.

>db.mycollection.drop()
true
>

Again check the list of collections into database.

>show collections
mycol
system.indexes
tutorialspoint
>

drop() method will return true, if the selected collection is


dropped successfully, otherwise it will return false.
MongoDB - Datatypes

• String − This is the most commonly used datatype to store


the data. String in MongoDB must be UTF-8 valid.

• Integer − This type is used to store a numerical value. Integer


can be 32 bit or 64 bit depending upon your server.

• Boolean − This type is used to store a boolean (true/ false)


value.

• Double − This type is used to store floating point values.

• Arrays − This type is used to store arrays or list or multiple


values into one key.

• Timestamp − used to store a timestamp. This can be handy


for recording when a document has been modified or added.
• Object − This datatype is used for embedded documents.

• Null − This type is used to store a Null value.

• Symbol − This datatype is used identically to a string; however, it's


generally reserved for languages that use a specific symbol type.

• Date − This datatype is used to store the current date or time in


UNIX time format. You can specify your own date time by creating
object of Date and passing day, month, year into it.

• Object ID − This datatype is used to store the document’s ID.

• Binary data − This datatype is used to store binary data.

• Code − This datatype is used to store JavaScript code into the


document.

• Regular expression − This datatype is used to store regular


expression.
Consistency
• Consistency in MongoDB database is configured
by using the replica sets and choosing to wait
for the writes to be replicated to all the slaves
or a given number of slaves.

• Every write can specify the number of servers


the write has to be propagated to before it
returns as successful.

• A command like db.runCommand({ getlasterror :


1 , w : "majority" }) tells the database how
strong is the consistency you want.
• For example, if you have one server and specify
the w as majority, the write will return
immediately since there is only one node.

• If you have three nodes in the replica set and


specify w as majority, the write will have to
complete at a minimum of two nodes before it is
reported as a success.

• You can increase the w value for stronger


consistency but you will suffer on write
performance, since now the writes have to
complete at more nodes.
• Replica sets also allow you to increase the read
performance by allowing reading from slaves by setting
slaveOk; this parameter can be set on the connection, or
database, or collection, or individually for each operation.

Mongo mongo = new Mongo("localhost:27017");


mongo.slaveOk();

• Here we are setting slaveOk per operation, so that we can


decide which operations can work with data from the
slave node.

DBCollection collection = getOrderCollection();


BasicDBObject query = new BasicDBObject();
query.put("name", "Martin");
DBCursor cursor = collection.find(query).slaveOk();
• Similar to various options available for read, you
can change the settings to achieve strong write
consistency, if desired.
• By default, a write is reported successful once the
database receives it; you can change this so as to
wait for the writes to be synced to disk or to
propagate to two or more slaves. This is known as
WriteConcern: You make sure that certain writes
are written to the master and some slaves by
setting WriteConcern to REPLICAS_SAFE.
• Shown below is code where we are setting the
WriteConcern for all writes to a collection:

DBCollection shopping =
database.getCollection("shopping");
shopping.setWriteConcern(REPLICAS_SAFE);
• WriteConcern can also be set per operation
by specifying it on the save command:
• WriteResult result = shopping.insert(order,
REPLICAS_SAFE);

• There is a tradeoff that you need to carefully


think about, based on your application needs
and business requirements, to decide what
settings make sense for slaveOk during read
or what safety level you desire during write
with WriteConcern.
Transactions
• Transactions, in the traditional RDBMS sense, mean
that you can start modifying the database with
insert, update, or delete commands over different
tables and then decide if you want to keep the changes
or not by using commit or rollback.

• These constructs are generally not available in


NoSQL solutions—a write either succeeds or fails.

• Transactions at the single-document level are known


as atomic transactions. Transactions involving
more than one operation are not possible, although
there are products such as RavenDB that do support
transactions across multiple operations.
• By default, all writes are reported as successful.

• A finer control over the write can be achieved by


using WriteConcern parameter.

• We ensure that order is written to more than one


node before it’s reported successful by using
WriteConcern.REPLICAS_SAFE.

• Different levels of WriteConcern let you choose


the safety level during writes; for example, when
writing log entries, you can use lowest level of
safety, WriteConcern.NONE.
final Mongo mongo = new Mongo(mongoURI);
mongo.setWriteConcern(REPLICAS_SAFE);
DBCollection shopping = mongo.getDB(orderDatabase)
.getCollection(shoppingCollection);
try
{
WriteResult result = shopping.insert(order, REPLICAS_SAFE);
//Writes made it to primary and at least one secondary
}
catch (MongoException writeException)
{
//Writes did not make it to minimum of two nodes including
primary
dealWithWriteFailure(order, writeException);
}
Availability

• The CAP theorem dictates that we can have only two


of Consistency, Availability, and Partition Tolerance.

• Document databases try to improve on availability


by replicating data using the master-slave setup.
The same data is available on multiple nodes and the
clients can get to the data even when the primary
node is down.

• Usually, the application code does not have to


determine if the primary node is available or not.
MongoDB implements replication, providing high
availability using replica sets.
• In a replica set, there are two or more nodes
participating in an asynchronous master-slave
replication. The replica-set nodes elect the master, or
primary, among themselves. Assuming all the nodes
have equal voting rights, some nodes can be favored
for being closer to the other servers, for having
more RAM, and so on; users can affect this by
assigning a priority—a number between 0 and
1000—to a node.

• All requests go to the master node, and the data is


replicated to the slave nodes. If the master node
goes down, the remaining nodes in the replica set
vote among themselves to elect a new master; all
future requests are routed to the new master, and the
slave nodes start getting data from the new master.
• When the node that failed comes back online, it
joins in as a slave and catches up with the rest of the
nodes by pulling all the data it needs to get current.

• Following Figure is an example configuration of


replica sets.

• We have two nodes, mongo A and mongo B, running


the MongoDB database in the primary data-center,
and mongo C in the secondary datacenter.

• If we want nodes in the primary datacenter to be


elected as primary nodes, we can assign them a
higher priority than the other nodes. More nodes
can be added to the replica sets without having to
take them offline.
Figure: Replica set configuration with higher priority
assigned to nodes in the same datacenter
• The application writes or reads from the primary
(master) node. When connection is established, the
application only needs to connect to one node
(primary or not, does not matter) in the replica set, and
the rest of the nodes are discovered automatically.

• When the primary node goes down, the driver talks to


the new primary elected by the replica set.

• The application does not have to manage any of the


communication failures or node selection criteria.
• Using replica sets gives you the ability to have a
highly available document data store.

• Replica sets are generally used for


– Data redundancy
– Automated failover
– Read scaling
– Server maintenance without downtime
– Disaster recovery.

• Similar availability setups can be achieved with


CouchDB, RavenDB, Terrastore, and other products.
Scaling
• The idea of scaling is to add nodes or change data
storage without simply migrating the database to a
bigger box. We are not talking about making
application changes to handle more load; instead, we
are interested in what features are in the database
so that it can handle more load.

• Scaling for heavy-read loads can be achieved by


adding more read slaves, so that all the reads can be
directed to the slaves. Given a heavy-read
application, with our 3-node replica-set cluster, we
can add more read capacity to the cluster as the read
load increases just by adding more slave nodes to the
replica set to execute reads with the slaveOk flag.
Following figure is horizontal scaling for reads.
Figure :Adding a new node, mongo D, to an existing replica-set cluster
• Once the new node, mongo D, is started, it needs to
be added to the replica set.
rs.add("mongod:27017");

• When a new node is added, it will sync up with the


existing nodes, join the replica set as secondary
node, and start serving read requests.

• An advantage of this setup is that we do not have to


restart any other nodes, and there is no downtime
for the application either.
• When we want to scale for write, we can start
sharding the data. Sharding is similar to partitions in
RDBMS. With RDBMS, partitions are usually on the
same node, so the client application does not have to
query a specific partition but can keep querying the
base table; the RDBMS takes care of finding the right
partition for the query and returns the data.
• In sharding, the data is also split by certain field, but
then moved to different Mongo nodes. The data is
dynamically moved between nodes to ensure that
shards are always balanced. We can add more nodes
to the cluster and increase the number of writable
nodes, enabling horizontal scaling for writes.
• db.runCommand( { shardcollection :
"ecommerce.customer", key : {firstname : 1} } )
• Splitting the data on the first name of the customer
ensures that the data is balanced across the shards
for optimal write performance; furthermore, each
shard can be a replica set ensuring better read
performance within the shard.

• When we add a new shard to this existing sharded


cluster, the data will now be balanced across four
shards instead of three. As all this data movement
and infrastructure refactoring is happening, the
application will not experience any down time,
although the cluster may not perform optimally when
large amounts of data are being moved to rebalance
the shards
Figure: MongoDB sharded setup where each shard is a
replica set
• The shard key plays an important role. You may
want to place your MongoDB database shards closer
to their users, so sharding based on user location
may be a good idea.

• When sharding by customer location, all user data for


the East Coast of the USA is in the shards that are
served from the East Coast, and all user data for the
West Coast is in the shards that are on the West
Coast.
Suitable Use Cases
• Event Logging
Applications have different event logging needs; within
the enterprise, there are many different applications
that want to log events. Document databases can
store all these different types of events and can act as
a central data store for event storage. Events can be
sharded by the name of the application where the event
originated or by the type of event such as
order_processed or customer_logged.

• Content Management Systems, Blogging Platforms


Since document databases have no predefined
schemas and usually understand JSON documents, they
work well in content management systems or
applications for publishing websites, managing user
comments, user registrations, profiles.
• Web Analytics or Real-Time Analytics
Document databases can store data for real-time
analytics; since parts of the document can be
updated, it’s very easy to store page views or unique
visitors, and new metrics can be easily added without
schema changes.

• E-Commerce Applications
E-commerce applications often need to have flexible
schema for products and orders, as well as the
ability to evolve their data models without expensive
database refactoring or data migration
When Not to Use

• Complex Transactions Spanning Different Operations


If you need to have atomic cross-document operations,
then document databases may not be for you. However,
there are some document databases that do support
these kinds of operations, such as RavenDB.

• Queries against Varying Aggregate Structure


Flexible schema means that the database does not
enforce any restrictions on the schema. Since the data is
saved as an aggregate, if the design of the aggregate is
constantly changing, you need to save the aggregates at
the lowest level of granularity—basically, you need to
normalize the data. In this scenario, document
databases may not work.

You might also like