apache | TechWell

apache

Close-up of database servers Breaking Down Apache’s Hadoop Distributed File System

Apache Hadoop is a framework for big data. One of its main components is HDFS, Hadoop Distributed File System, which stores that data. You might expect that a storage framework that holds large quantities of data requires state-of-the-art infrastructure for a file system that does not fail, but quite the contrary is true.

Deepak Vohra's picture
Deepak Vohra
Shelves storing many books and files Comparing Apache Hadoop Data Storage Formats

Apache Hadoop can store data in several supported file formats. To decide which one you should use, analyze their properties and the type of data you want to store. Let's look at query time, data serialization, whether the file format is splittable, and whether it supports compression, then review some common use cases.

Deepak Vohra's picture
Deepak Vohra
Apache logo Comparing Apache Sqoop, Flume, and Kafka

Apache Sqoop, Flume, and Kafka are tools used in data science. All three are open source, distributed platforms designed to move data and operate on unstructured data. Each also supports big data in the scale of petabytes and exabytes, and all are written in Java. But there are some differences between these platforms.

Deepak Vohra's picture
Deepak Vohra
Beyond Docker—Containers, Take 2

Everyone in the development community seems to be talking about Docker Containers. Given the interest to develop architectures and deployment tools that can spin up new applications faster than ever in virtualized and cloud environments, it is time to dig deeper into what all the fuss is about.

Beth Cohen's picture
Beth Cohen
Three Software Bugs You May Have Missed

We know it’s hard to keep up with the constant bombardment of software news in the tech world. In this roundup, we present you with three software bugs that you should probably be aware of, especially if you are a Microsoft or CloudStack user.

Jonathan Vanian's picture
Jonathan Vanian
Netflix Has a Eureka Moment

Netflix recently announced an open source cloud service registry and cloud load balancer project called Eureka. If dynamic registry updates and run-time instance discovery are within your cloud project requirements, consider using Eureka in your Java PaaS framework.

Chris Haddad's picture
Chris Haddad