Lekcija09 - 04 NoSQL Redis
Lekcija09 - 04 NoSQL Redis
Lekcija09 - 04 NoSQL Redis
Databases
Classical variant - store data in a relational database
MySQL PostgreSQL H2 SQLite HSQLDB and many more...
NoSQL introduction
NoSQL is a non-relational database management system, different from traditional RDBMS in some significant ways Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not expose the standard SQL interface
NoSQL introduction
In 2009, Eric Evans reused the term to refer databases which are non-relational, distributed, and does not conform to ACID
The NoSQL term should be used as in the Not-Only-SQL and not as No to SQL or Never SQL
ACID
Atomicity
"all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged
Consistency
Ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules (constraints, cascades, triggers etc).
ACID
Isolation
Ensures that the concurrent execution of transactions results in a system state that could have been obtained if transactions are executed serially
Durability
Means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors
NoSQL introduction
Two trends:
The exponential growth of the volume of data generated by users and systems The increasing interdependency and complexity of data, accelerated by the Internet, Web 2.0, social networks
NoSQL databases are useful when working with a huge quantity of data and the data's nature does not require a relational model for the data structure
NoSQL characteristics
Does not use SQL as its query language
NoSQL databases are not primarily built on tables, and generally do not use SQL for data manipulation
Eventual consistency
Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent
Conflict resolution:
Read repair: The correction is done when a read finds an inconsistency. This slows down the read operation. Write repair: The correction takes place during a write operation, if an inconsistency has been found, slowing down the write operation. Asynchronous repair: The correction is not part of a read or write operation.
Eventual consistency
Given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and all the replicas will be consistent
Conflict resolution:
Read repair: The correction is done when a read finds an inconsistency. This slows down the read operation. Write repair: The correction takes place during a write operation, if an inconsistency has been found, slowing down the write operation. Asynchronous repair: The correction is not part of a read or write operation.
BASE vs ACID
BASE is an alternative to ACID
Basically Available Soft state Eventual consistency
Scalability
Scalability is the ability of a system to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growths Scale horizontally (scale out)
Add more nodes to a system, such as adding a new computer to a distributed software application
CAP theorem
States that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency
All nodes see the same data at the same time
Availability
A guarantee that every request receives a response about whether it was successful or failed
Partition tolerance
The system continues to operate despite arbitrary message loss or failure of part of the system
Brewers Conjecture
In 2000, a conjecture was proposed in the keynote speech by Eric Brewer at the ACM Symposium on the Principles of Distributed Computing Slides: http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
Formal proof
In 2002, Seth Gilbert and Nancy Lynch of MIT, formally proved Brewer to be correct
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjectureSigAct.pdf
We have shown that it is impossible to reliably provide atomic, consistent data when there are partitions in the network.
It is feasible, however, to achieve any two of the three properties: consistency, availability, and partition tolerance. In particular, most real-world systems today are forced to settle with returning most of the data, most of the time.
Redis
Redis is an open source, advanced key-value data store Often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets Redis works with an in-memory dataset It is possible to persist dataset either by
dumping the dataset to disk every once in a while or by appending each command to a log
Installation
Linux: http://redis.io/download
Windows
1. Clone from Git repo:
https://github.com/MSOpenTech/redis
Configuration
Configuration file: /redis/redis.conf
It is possible to change a port (if you wish): port 6379 For development environment it is useful to change data persisting policy
save 900 1 save 300 10 save 60 10000 save 10 1
save after 10 sec if at least 1 key changed
Useful Commands
Print all keys:
KEYS *
Remove all keys from all databases
FLUSHALL
Synchronously save the dataset to disk
SAVE
Redis keys
Keys are binary safe - it is possible to use any binary sequence as a key The empty string is also a valid key Too long keys are not a good idea Too short keys are often also not a good idea ("u:1000:pwd" versus "user:1000:password")
Redis Strings
Most basic kind of Redis value
Binary safe - can contain any kind of data, for instance a JPEG image or a serialized Ruby object Max 512 Megabytes in length Can be used as atomic counters using commands in the INCR family Can be appended with the APPEND command
Redis Lists
Lists of strings, sorted by insertion order
Add elements to a Redis List pushing new elements on the head (on the left) or on the tail (on the right) of the list Max length: (2^32 - 1) elements Model a timeline in a social network, using LPUSH to add new elements, and using LRANGE in order to retrieve recent items Use LPUSH together with LTRIM to create a list that never exceeds a given number of elements
Redis Hashes
Map between string fields and string values Perfect data type to represent objects
HMSET user:1000 username antirez password P1pp0 age 34 HGETALL user:1000 HSET user:1000 password 12345 HGETALL user:1000
Redis Operations
It is possible to run atomic operations on data types: appending to a string incrementing the value in a hash
pushing to a list
computing set intersection, union and difference getting the member with highest ranking in a sorted set
Jedis - a blazingly small and sane Redis Java client Spring Data Redis
Column: HBase
Maven dependency:
<dependency> <groupId>org.springframework.data</groupId> <artifactId>spring-data-redis</artifactId> <version>1.1.0.RELEASE</version> </dependency>
Java Code
FeedRedisDAOImpl
public class FeedRedisDAOImpl implements FeedRedisDAO { @Autowired private RedisTemplate<String, Feed> template;
public void addFeed(Long userId, Feed feed){ RedisList<Feed> feeds = feeds(userId); feeds.add(feed); }
public List<Feed> getFeeds(Long userId){ RedisList<Feed> feeds = feeds(userId); return convertToArrayList(feeds); } private RedisList<Feed> feeds(Long userId) { DefaultRedisList<Feed> feeds = new DefaultRedisList<Feed>( RedisKeys.feeds(userId.toString()), template); feeds.setMaxSize(50); return feeds; } }
Resources
Redis
http://www.redis.io