Object Storage: File System Database Mash-Up
Cloud storage is a booming business with consumer subscriptions targeted to exceed 625 million—up 25 percent from the half billion reached in 2012. This doesn’t even touch the commercial side of cloud storage where the number of subscribers might be a fraction, but the average data set per company is in the 200 terabyte range.
Have you ever wondered where your cloud data is actually located? How do they store all those exabytes (a 1000 petabytes or 1 million terabytes) of unstructured data? Wonder no more. A new generation of cutting edge data storage, modeled on the distributed database rather than the traditional hierarchical file system paradigm, is revolutionizing how storage is built and delivered in the cloud.
The dirty little secret of the storage business is of course that disks are considered commodities—cheap and essentially unreliable. With an industry average of an annual 10 percent failure rate, this is not an unreasonable assumption. Traditional storage vendors apply massive amounts of disk management and data duplication to get around the problem, but this approach can be costly and not very scalable.
Others, such as X-IO Technology, take a very different approach. Working directly with Seagate, this startup company takes advantage of built-in disk firmware to ensure that the data is not lost at the bit level. This allows the company to build highly efficient, intelligent storage because it doesn’t need to maintain multiple copies of all the bits.
At the other extreme, object storage systems use concepts taken from distributed databases to store data in what are termed objects. Think of objects as the equivalent of data records, except that instead of being highly structured like a traditional database, the objects can be any kind of digital data—files, images, binary keys, or anything comprised of ones and zeros.
These objects are distributed over the entire data storage pool by using hash keys or other algorithmic methods. Access to the stored objects is through the stored key values, which for data redundancy and access speed are also copied across the entire pool of storage.
A typical object storage system, such as S3 or Swift, requires a minimum of three copies of each object stored in the system for maximum assurance that the data will never be lost. Emerging commercial object store vendors include Scality, Cleversafe, and Basho, which interestingly started as a distributed database company.
For Storage-as-a-Service vendors, as the appetite for massive amounts of cheap storage grows, there is increasing pressure for the cost of delivering the service to drop. The good news is that cloud storage providers are in a great position to leverage the capabilities of the emerging object storage and other cloud friendly products to deliver massive amounts of cheap, highly horizontally scalable, unstructured storage for their data hungry customers.
On the other hand, traditional storage vendors relying on costly block level management software and traditional hierarchical file systems will quickly get cut out of the equation.