Database

Database is an organized collection of data.

DBMS (Database Management System)

Database management system provides efficient, reliable, convenient, and safe multi-user storage of and access to massive amounts of persistent data.

ACID

ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that database transactions are processed reliably.

Atomicity
- Atomicity requires that each transaction be "all or nothing": if one part of the transaction fails, the entire transaction fails, and the database state is left unchanged.
Consistency
- The consistency property ensures that any transaction will bring the database from one valid state to another.
Isolation
- The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e., one after the other.
Durability
- Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.

Type of Database

SQL (Structured Query Language)

Relational Database

The relational model is centered on this idea: the organization of data into collections of two-dimensional tables called “relations.”

Data Model

Collections of two-dimensional tables called “relations”
Table with columns and rows
Column is a description for each property
Row is the actual data for matching column

Pros

Flexible and well-established
Stable, standardized products available
Standard data access language through SQL
Costs and risks associated with large development efforts and with large databases are well understood
The fundamental structure is easily understood and the design and normalization process is well defined

Cons

Performance problems associated with reassembling simple data structures into their more complicated real-world representations
Lack of support for complex base types, e.g., drawings
SQL is limited when accessing complex data
Knowledge of the database structure is required to create ad hoc queries

Example of Relational Database

MySQL
MSSQL
PostgreSQL

NoSQL (Not only SQL)

Document Database

A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases and the popularity of the term "document-oriented database" (or "document store") has grown with the use of the term NoSQL itself.

Data Model

Collection of documents
A document is a key-value collection
Index-centric, lots of map-reduce

Pros

Simple, powerful data model
Scalable

Cons

Poor for interconnected data
Query model is limited to key and indexes
Map reduces for larger queries

Example

MongoDB
CouchDB

Key-value Database

A key-value store, or key-value database, is a computer program designed for storing, retrieving, and managing associative arrays, a data structure more commonly known today as a dictionary or hash. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database.

Data Model

Global key-value mapping
Big scalable hash map
Highly fault tolerant

Pros

Simple data model
Scalable
Redis and Mem-cached are useful for caching

Cons

Create your own "foreign keys"
Poor for complex data

Example

Riak
LevelDB
DynamoDB
Redis
Mem-cached

Big Table

BigTable maps two arbitrary string values (row key and column key) and timestamp (hence three-dimensional mapping) into an associated arbitrary byte array. It is not a relational database and can be better defined as a sparse, distributed multi-dimensional sorted map. BigTable is designed to scale into the petabyte range across "hundreds or thousands of machines, and to make it easy to add more machines [to] the system and automatically start taking advantage of those resources without any reconfiguration".

Pros

Naturally indexed
Scalable

Cons

Poor for interconnected data

Relational Database vs Big Table

Fixed Schema vs Schema-less
Row-oriented datastore vs Column-oriented datastore
Designed to store Normalized Data vs Designed to store Denormalized Data
Contains thin tables vs Contains wide and sparsely populated tables
Has no built-in support for partitioning vs Supports Automatic Partitioning

Data Model

A big table, with column families
Map reduce for querying and processing

Example

HBase
Cassandra

Graph Database

NoSQL was never a meaningful database classification. It is a label that can be applied to any database that does not have full support for SQL – and there are different reasons for choosing not to rely on SQL.

Relational database nor NoSQL database can hardly store and navigate with highly connected data structure. Such as Social computing, Recommendations system, Business intelligence, Scientific computing for bioinformatics, etc. Their data usually has a form of graph.

Data Model

Node with properties
Named relationship with properties
Hypergraph, sometimes

Pros

Powerful data model as general as RDBMS
Connected data locally indexed
Easy to query

Cons

Sharding
Understanding whole new kind of data structure

Example

Neo4j

Scaling

Vertical Scaling (Scale-up)

Generally refers to adding more processors and RAM, buying a more expensive and robust server.

Pros

Less power consumption than running multiple servers
Cooling costs are less than scaling horizontally
Generally less challenging to implement
Less licensing costs
(sometimes) uses less network hardware than scaling horizontally (this is a whole different topic that we can discuss later)

Cons

PRICE
Greater risk of hardware failure causing bigger outages
generally severe vendor lock-in and limited upgradeability in the future

Horizontal Scaling (Scale-out)

Generally refers to adding more servers with less processors and RAM. This is usually cheaper overall and can literally scale infinitely (although we know that there are usually limits imposed by software or other attributes of an environment’s infrastructure)

Pros

Much cheaper than scaling vertically
Easier to run fault-tolerance
Easy to upgrade

Cons

More licensing fees
Bigger footprint in the Data Center
Higher utility cost (Electricity and cooling)
Possible need for more networking equipment (switches/routers)

Author

Name     : Leonardo Taehwan Kim
Email    : contact@thefinestartist.com
Website  : http://www.thefinestartist.com