Benefits of Using Columnar Storage in Relational Database Management Systems

By Deepak Vohra - May 11, 2020

Relational database management systems (RDBMS) store data in rows and columns. Most relational databases store data row-wise by default, but a few RDBMS, including MariaDB and PostgreSQL, provide the option to store data column-wise, which is a useful feature.

The benefits of columnar storage are based on a couple of assumptions:

A typical database table has very few columns compared to rows—a table could have 10 columns and a million rows of data. A large data set would typically need to be stored across multiple disks
The database queries are for a subset of the data rather than complete rows of data

Let’s look at the benefits of being able to use columnar storage for data.

What’s the difference?

When RDBMS store data row-wise, in the following example, the ID, first name, and last name would be stored as follows, in which “;” demarcates data stored together:

1,John,Smith;2,Deepak,Vohra;3,Brad,Jones

Columnar storage for the same data would be:

1,2,3;John,Deepak,Brad;Smith,Vohra,Jones

Columnar storage

Querying a subset of data

A query on a subset of data on row-oriented RDBMS has to scan several rows of data on the hard disks before finding the data that matches the query. Columnar databases are more efficient to query a subset of data because once the column containing the data has been found, the corresponding row values may be obtained. The query is optimized to scan only the necessary columns instead of scanning rows and rows of data. The query performance of a columnar database is much better than a row-oriented database if the query is over a subset of data.

Working with big data

If a table has only 10 rows of data and three columns, the query performance wouldn’t be much different. The main benefit of columnar data is realized if a table has many more rows than columns, and you really start to see this benefit when working with big data. The MariaDB ColumnStore engine could be used to process petabytes of data with linear scalability and high performance.

Indexes aren’t needed

If a row-oriented storage engine, such as InnoDB engine in MySQL and MariaDB, is used for processing large data sets, the big data would need to be indexed to improve query performance. Indexes also improve the performance of queries over a subset of data. But indexes add additional disk storage. In comparison, a columnar storage engine data set doesn’t need to be indexed, as it responds to queries over a subset of data much faster inherently.

More suitable for OLAP

Columnar storage is more suitable for analytical queries—such as online analytical processing, or OLAP—rather than transactional queries—such as online transaction processing, or OLTP. Analytic queries with columnar storage return a response in real time. MariaDB’s ColumnStore would be a good choice to use for big data analytic queries involving complex joins and aggregations.

Easier to compress

Another benefit of columnar storage is compression efficiency. As columnar data is of uniform type, compression efficiency is improved over compressing row-oriented data with possibly dissimilar data types.

Tags:

Up Next

Choosing the Right Threat Modeling Methodology

May 8, 2020

Get TechWell Insights Delivered Weekly

All TechWell Insights by this Author

Related Insights

About the Author

Deepak Vohra

Deepak is a Sun Certified Java Programmer and Web Component Developer, and has worked in the fields of XML, Java programming and Java EE for ten years. Deepak is the co-author of the Apress book Pro XML Development with Java Technology and was the technical reviewer for the O'Reilly book WebLogic: The Definitive Guide. Deepak was also the technical reviewer for the Course Technology PTR book Ruby Programming for the Absolute Beginner. Deepak is also the author of the Packt Publishing books JDBC 4.0 and Oracle JDeveloper for J2EE Development, Processing XML Documents with Oracle JDeveloper 11g, EJB 3.0 Database Persistence with Oracle Fusion Middleware 11g, and Java EE Development in Eclipse IDE. Deepak is a Docker Mentor and has published 5 books on Docker and Kubernetes.