Reasons to Use a NoSQL Database
SQL (Structured Query Language) is used to query a relational database, where data is organized into relations. Relation-model databases worked for most data storage and retrieval for years.
But when the World Wide Web was developed, relational database systems did not meet the needs of web-scale information systems. Consequently, NoSQL storage and retrieval information systems were developed between 2005 and 2010.
The term NoSQL has been described variously as non-SQL, non-relational, and not only SQL, but what is implied by all of these is that it doesn’t use the relational model of data. It does not mean no SQL is used; in fact, most NoSQL databases support a SQL-like query language.
Semi-Structured and Unstructured Data
Data on the Web can be unstructured, semi-structured, or structured. The data could be binary files, image files, or free-form documents in various formats, such as PDF, PPT, XLS stylesheets, XML, or JSON. NoSQL databases are designed for semi-structured and unstructured data.
Agile Database Schema
While a relational database table must conform to a fixed schema, a NoSQL database doesn’t need to. Essentially, a NoSQL database is schema-free. Because of the varying requirements of a NoSQL database, the database schema needs to be agile.
Distributed, Scalable, and Highly Available
The priorities of a NoSQL database are different from those of a relational database. A web-scale database that has to serve queries by millions of users concurrently must be highly distributed, scalable, and available. A NoSQL database typically spans several nodes or machines across different regions.
Scalability is achieved by adding or removing machines, which is called horizontal scaling. In contrast, a relational database is typically scaled by adding more capacity to the same database system, which is called vertical scaling. As failure of some of the machines in a distributed cluster is possible, a distributed, scalable NoSQL database cluster is fault-tolerant and highly available.
Consistency refers to the requirements of how updates to data are viewed by different users accessing a database. Relational database systems typically require strong or immediate consistency, which implies that updates or changes to data are viewed consistently by all users accessing the same data—one user doesn’t get an outdated version of the data while another gets the new version of the data.
In a web-scale information system performance, availability and speed are high priority, and to serve these needs, consistency is compromised to some extent by NoSQL databases. Most NoSQL databases are eventually consistent, which implies that all updates to a datastore are eventually consistent but not immediately. A user accessing a NoSQL database could be served an outdated version of the data because of the latency involved in propagating an update across a large-scale distributed system.
Some NoSQL databases offer tunable consistency, with trade-offs based on the operation. Because of the inherent limitations defined by the CAP theorem, NoSQL databases compromise consistency to provide high availability and partition tolerance.
When to Use NoSQL
In recent years, some relational databases have included features typical of NoSQL databases, such as auto-sharding across a distributed cluster and support for unstructured and schema-free data, but NoSQL databases are still favored for web-scale information systems in which big data requirements, agility, high availability, and performance are prioritized.