Who will win? Cloudera, Mongo DB or DataStax?

Selwyn Zhou

Today, let's talk a little about three popular Big Data solution provider companies - Cloudera, Mongo DB and DataStax. Everyone talks about "Big Data", no one knows what exactly "Big Data" is. These three companies' products may help you to solve some big data issues. Of course, there are still many other great vendors, like Hortonworks, Netezza, etc. We will talk about them later.


Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. Cloudera has Hbase and is a leader in the Hadoop markets; Cloudera's Hadoop-based Enterprise Data Hub can function as an analytical platform. CDH contains the main, core elements of Hadoop that provide reliable, scalable distributed data processing of large data sets (chiefly MapReduce and HDFS), as well as other enterprise-oriented components that provide security, high availability, and integration with hardware and other software.

Mongo DB

MongoDB is a successful leading vendor in the NoSQL markets. MongoDB will be pitched as an operational database for highly-scalability applications while operational data with MongoDB can be snapshotted into Cloudera's data hub in parallel for analysis. This analysis can happen in near-real-time through the Shark framework or Impala and then be passed back to MongoDB to trigger the display of personalized content or a most-appropriate offer based on the analysis within Hadoop.


DataStax's business model centers around selling an enterprise distribution of the open-source Apache Cassandra project which includes extensions to Apache Cassandra, Analytics and Search functionality. DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the Enterprise. However, it is still swinging between NoSQL and Hadoop roles. DataStax's software distribution, for example, includes both the Cassandra NoSQL database and Hadoop, and they both run on the same cluster. What's more, DataStax and other high-scalability database vendors have been busy adding to and touting the analytic query capabilities of their databases.