Preparing to Invest in Your Data and Analytics Ecosystem
“The vast majority are still struggling to understand the nuances between big data infrastructure, processing and analytics. Very often the best place to start is by empowering the business stakeholders with the ability to measure and distribute the basic and most important information.”
Lately it seems like the world of data management and analytics has been overtaken by buzzwords. Terms like big data, fast data, machine learning, deep learning, and artificial intelligence (AI) – not to mention getting lumped into the latest hash tags like agile, DevOps, and Blockchain – have obscured the value of the data management function and how organizations can really benefit from it.
Buzzwords aside, If we pause to unpack the data trends of the moment, it all boils down to the ability to take large volumes of data from any internal or external source (no matter how big or small) in real time and come up progressively smarter digital services that will tell us if and how we’re going to be successful or not. Add in the requirement to turn on a dime and do the whole thing again with different datasets, systems, people and processes, and you have a good idea of some of very real challenges facing IT departments today.
Many organizations are quick to proclaim their entry into the XYZ (insert Big Data, Analytics, AI, et al) space, yet most haven’t figured out how to prioritize which data, information, or outcomes are most important to measure. How then can you know what value the answers (or questions) uncovered by your data analytics process will provide?
To complicate things further, many platforms and tools advertise their ability to do big data, NoSQL, analytics, graphs, visualization, you name it. How do you assess their capabilities and fit for your organization? What kind of talent do you need to rent, buy or train to get the solutions to deliver on promised value?
These are some of the basic technology questions you need to ask before investing in a solution or stack of solutions. Unfortunately for a lot of organizations, this type of assessment isn’t being done.
Take business alignment for example. How aligned are the business stakeholders with technology delivery teams? One way to think about it is the following ratio: The greater the misalignment, the greater the overall expense (+rework, +turnover, +) in implementing your data and analytics Strategy. Note: by misalignment, I mean the tendency both business and technology teams can have if trying to perform each other’s functions if left unchecked.
Let’s assume for a moment that there is a clear strategy in place that is business-led, regarding how data and analytic capabilities can drive revenue opportunities and operational efficiencies across the organization. Also for the sake of argument, let’s assume that the technology organization is closely aligned with the business and ready to execute against its priorities. Where do you start?
There are many choices to make in terms of where to invest resources and time. My recommendation is to align very specific outcomes from a business perspective that have success attributes that can only be measured or improved by investing in data processing and technology solutions –i.e. To attract more profitable customers, one must determine 1) What is a customer? 2) Which customers are most profitable? 3) The segments that the most profitable customers can be grouped by 4) The dependent and independent variables that make up these customers and segments 5) How these data points will be collected (volume, frequency, quality, etc.) 6) The mathematical models that will yield the best results for not only predicting the future segments that yield more profitable customers, but what steps the firm can take to best influence those outcomes 7) what technology can enable the collection, storage, calculation, and distribution of these results to stakeholders at various levels in the organization.
This type of assessment is table stakes, the outcome of which has to be in place before investing in these often very costly endeavors.
There are other considerations, but the above takes a specific business challenge faced by many in the for-profit space and breaks it down into actionable task items of data definition (governance), quantification (analytics), integration, quality, and architecture. The tools and platforms that enable these functions are as diverse as the languages they are both written and spoken in (pun intended).
The most important aspect is to recognize the relative importance of each facet (individual machines in a data factory, instruments in a data symphony, etc.) as it relates to the final deliverable, which, in this case, is a machine learning model that can scale and be automated against changing, heterogenous data.
If, in your design, you realize that most of your data is file-based, non-relational objects, you may discard more traditional database platforms and start to look at a solution in Hadoop or a more standalone NoSQL appliance. If most of the data is coming from internal and external web traffic and stored in loosely modeled document form, then starting small with a NoSQL document may be in order, which eventually may also need to link back to a file-based Hadoop system, then modeled into a graph datastore of relationships and inferences, that can more easily be read into a machine model for the predictive workload. Don’t rule out the traditional RDBMS either, as they will likely still have a role for a one while to come for highly normalized enterprise data structures for reporting.
The above is just one use case example across a hybrid of technologies and methods. The challenge is often figuring out for yourselves where you have capacity, talent, and appetite to invest in newer approaches that not only solves a problem but does so in a way that is cheaper and faster than applying legacy point-solutions and work-around methods. If not, then there’s always the ability to ETL data into a relational structure and put a business intelligence tool on top of it. That method has been proven for years, yet organizations still struggle to implement those basics. Either way, if you aren’t aiming these results at very tangible, measurable outcomes with financial benefit, then no amount of expertise at big, yet artificially intelligent data solutions is going to help you.
The vast majority are still struggling to understand the nuances between big data infrastructure, processing, and analytics. Very often the best place to start is by empowering the business stakeholders with the ability to measure and distribute the basic and most important information. This will then likely breed the appetite for more and more sophisticated methods of collecting, interpreting, and sharing data – which then empowers the producers and consumers to more effectively govern it – thus, organically evolving the data and analytics center of excellence, rather than branding one and hoping for the best.