Big Data: What is it, and how does it apply to Your Business?
“You can’t manage what you don’t measure.”
This old adage holds true today, in that it is the main reason for the recent digital data explosion. Basically, big data is a direct result of the need for managers to know their business and make better informed decisions.
Several years ago, I ran across a web site for a large electronics retail chain, which is still one of the leaders today. I noticed a message on the site, indicating that not their entire inventory was published on the site. They were apparently “testing the waters” of doing product promotion and a limited amount of mail ordering. Having spent many years building data systems and analytics applications, this sounded absurd to me. Why couldn’t they use existing product and Point of Sales data? Obviously they had several disparate data sets, and hadn’t matured as an organization enough to analyze their data from one source.
I’d like to discuss the “Three Vs.” The term “Big Data” is a bit of a misnomer. Gartner identifies the big data segment as Velocity, Variety, and Volume. Most people I meet with tend to ask why big data is even applicable to them, being they don’t have much volume. I tend to explain that Velocity and Variety are just as applicable to big data, as Volume.
Consider a company like Amazon.com, who is actually one of the pioneers of big data. Amazon processes vast amounts of orders every second (velocity), schedule delivery through disparate carriers (variety), and store vast amounts of customer order information (volume). Furthermore, they use predictive algorithms to determine customer preferences, and make product recommendations. The latter is becoming more and more prevalent across the web. You’ve probably seen something similar to “You may also like…” at the bottom of your favorite retailer’s ordering page.
Netflix does something similar. Based on your previous selections they make recommendations on what else you may be interested in viewing. Also, like Amazon, they supplement this data with predictive algorithms, based on your ratings of what you view. That said, the more you use the system, the better the recommendations. This allows Netflix to also see what their customer base prefers and views most, and cater product offerings accordingly.
To be quite honest, I tend to see Velocity and Variety more prevalent than Volume. Although data volumes are typically getting bigger, there are constantly new sources of data coming to light- in both structured and unstructured data. For example, The Internet of Things (or IoT) is bringing together many different devices such as power meters, appliances, Televisions, and practically anything you can think of.
So, with velocity and variety growing, how do you address this? What is the best solution for big data? Well, that (like everything else in this industry) is quite a loaded question. There are a number of additional questions that need to be answered. For Example:
- What are your main goals? Are you looking for a consolidated analytics platform, or a large transactional system?
- Are you open to cloud-based solutions?
- Which of the “Three Vs” is the most important?
- Velocity: If you are looking for real-time analytics data, solutions like SAP Hana or MS SQL 2014. Other solutions such as Cassandra can give you an infinitely scalable transactional system, giving you the ability to process large amounts of transactional data. An example of the latter would be Facebook or Netflix.
- Variety: I’ve had clients with 5 or more different types of data sources, and multiple instances of each type. For something like this, you will need to ensure your ETL processes can effectively support all of your different sources. For example, data mining twitter feeds, and marrying up customer sentiment with demand.
- Volume: Pure and simple. Scalability. Large data solutions need to be scalable, with the ability to grow easily. Solutions such as SAP HANA have scalability, along with inherent data compression. Hadoop-based solutions have the ability to scale outward, easily and at a minimum cost.
- What kind of budget do you have to work with?
- High: If budgetary constraints are not a concern, or ROI returns allow for a larger budget, SAP HANA is going to be the biggest/fastest solution.
- Medium: This is probably the largest market segment, and probably the most difficult to navigate. This can be anywhere from a single stand-alone solution, to a composite of multiple solutions.
- Low: This is definitely Hadoop territory. Originally invented by Yahoo, Hadoop is now open source. Companies like Hortonworks and Cloudera provide production-ready, supported platforms for large data sets.
Overall, it is important to take a strategic, methodical approach to your big data goals. Start with your objectives, and weigh your options. A big data partner like ATCG Solutions can help you.
To find out more about big data strategy please click below: