Speeding Time to Insight: Why Self-Service Data Prep is the Future of Big Data
Did you know the amount of information in the digital universe can fill a stack of iPad Air tablets reaching two-thirds of the way to the moon? That’s 157,674 miles. According to a report by EMC and IDC, the average household created enough new data last year to fill sixty-five 32GB iPhones per year. By 2020, each household will generate enough data to fill 318 iPhones per year. Growth of this nature sometimes feels like it is almost approaching the speed of light, but the rapid pace is not all that surprising when we think of the technological advances made in recent years that have led to more overall data. Who knew a dishwasher would ever be producing data or how chatty my car would be? Ten years ago, we never would have thought that something like a Fitbit could monitor us 24/7 and produce reports based on the data behind all of our movements.
Every tweet or post we make on social media is but one of the many daily digital footprints we leave that are instantaneously woven together to construct the digital story of who we are and what we love. Will human desire to record our habits and behaviors, and the corresponding expansion of data records, soon surpass our ability to process and correlate them?
The Big Data Insights Conundrum
Brilliant people have crowd-sourced and delivered breathtaking innovation in data storage and processing efficiencies: Hadoop distributed file systems, Mongo DBs, Cassandra, MapReduce, and other technologies and companies with naming conventions that evoke images of friendly elephants and the power of prophecy. These hyper-growth technologies and the companies behind them were created to help the world get value from data, working to optimize information processing and minimize storage problems created by Big Data.
It feels great to crunch yottabytes of records, to mash them together in an effort to detect new patterns that might tell us things we don’t know. Its part of what makes us human, that search for connections among chaos in the hope that it will lead us to new insights.
Yet, as I visit with companies around the world, it seems like we are “panning for gold” rather than mining for insights. While we have created amazing new capabilities to store and process data, and therefore correlate it, sometimes there is a feeling that we are not any closer to surfacing new insights. Correlation does not necessarily imply that we are getting better at understanding causation. Often we get bogged down with the vast amounts of data we are storing and processing. We feel compelled to analyze every single bit of data available to us in our efforts to uncover insights. While thorough, this approach leads to a very long and convoluted process before we ever discover insights - if we even uncover any at all. Sometimes that slow, all or nothing approach results in analysis paralysis, where we simply have too much data and are unable to combine it in any intelligible way that would show value.
The Need for Speed
A few months ago, I was working with a team of 40 business users who were depending on centralized, IT-led data processing and a requirements process to assemble the data they needed to author mission critical financial reports. As I learned more about the project, I realized it was taking them six months to get actionable data from which they could begin to author their reports. That’s far too long. As the speed and variety at which data is produced continues to escalate exponentially, we must include speed as a priority of our Big Data analysis. A six month lag between obtaining data and beginning to analyze it leads to data that could be obsolete by the time it’s fully analyzed and insights are discovered. This is a realtime world we live in. Companies that realize that and take steps to produce business insights much quicker than they currently do could see competitive advantages. According to Forrester analyst Boris Evelson, “Faster access to insights will make companies more agile. Companies that have the same quality of information as their competitors but get it sooner and can turn it into action faster will outpace their peers.” But somehow, as new technologies have made it possible to collect, save and process increasingly massive amounts of data, the ability to prepare and analyze that same data has not kept pace. This gap in the analytic process is ripe for innovation.
Using Self-Service Data Prep to Speed Discovery and Achieve Business Insights
The key to moving from “panning” for insights to more productive strategies is to provide context to the data. Contextual knowledge typically resides with the business user who is striving to achieve a consequential business result, but is hampered by having to wait to receive access to critical data from the IT ‘gatekeepers’ or data scientist who created the analytic app. With self-service data prep, companies are able to reinvent the way business users prepare data, assemble data, author analytics and operationalize them. Companies can now bring the business user more directly in touch with the data, allowing them to gather, design, test, debug and operationalize the analysis for themselves, thus removing the bottleneck of having to wait for the data, or not being able to consume the data due to lack of programming skills. In turn, this allows them to accelerate the analytic supply chain from identifying a business problem to achieving a business result, in hours and not days or weeks.
"As the speed and variety at which data is produced continues to escalate exponentially, we must include speed as a priority of our Big Data analysis"
Industry analyst firm Gartner recently predicted that by 2017, “most business users and analysts in organizations will have access to self-service data prep tools to prepare data for analysis.” This kind of adoption shows that self-service data prep has the potential to completely disrupt the analytic supply chain, rapidly speeding time to insight and empowering business users to see new opportunities to problem solving by way of their data. To avoid being outmaneuvered by the pace of data creation, we must continue to improve the whole analytic process, and make accelerating the time to actionable insights a priority.