How IT Tames Big Data
1. The Clamor for Big Data–A Solution for Every Problem
Every day we hear exciting and paradigm changing examples of how big data revolutionizes customer service, leads to scientific discovery, delivers highly personalized goods and products and helps solve problems before equipment breaks. As the trumpets of success grow louder, insurance, one of the most data centric of all industries, is asking itself and being asked by its customers, how can it use big data to help them manage risk more effectively and less costly.
2. While Looking at Big Data, IT can Feel like Fish Out of Water
Analysis of big data is typically a business driven activity, relying on strong business domain expertise, statistical and data analysis skill sets that are not typical of IT expertise or practice. While data analysis languages like R or Python (when used for data analysis) use programming techniques familiar to IT, too few IT specialists have the statistical knowledge, background and training to perform statistical analysis correctly.
As tempting as old paradigms are, big data does not follow well known IT solution strategies, architectures and design patterns–many traditional IT assumptions, such as reliance on the business user to explain what data is important–don't hold with big data, which is an exploratory process.
The power of big data is in finding the seldom seen correlations and insights that are not readily available without deep analysis
Although big data may turn things upside down, IT has a critical role in the success in extracting value from big data– but its role and means of participation is very different. As it is a new paradigm, new approaches can often be built in partnership with business users, without threatening established processes and agreements supporting enterprise systems and data. However, practical experience indicates that some traditional IT led areas such as business intelligence and data governance may need to change.
3. Why is Big Data a Challenge for Insurance IT?
Simply put, it doesn't follow traditional rules. Traditional enterprise data strategies evolved to simplify access to structured data in core transactional systems–policy administration systems, claims systems, customer servicing systems, etc. Big data tries to supplement known structured sources with additional data to answer either new questions or old questions better; e.g. What's the right price for this risk? Should I renew this customer's policy? Is this claim complex– will it need a specialized adjuster? Is this claim fraudulent?
It's very difficult to know–up front– what information will be useful. Traditional IT methods would suggest interviewing business participants, examining processes, documenting rules and identifying information that was known to be important. The power of big data is in finding the seldom seen correlations and insights that are not readily available without deep analysis.
This means that typical strategies to organize data for analysis don't know where to start and usually never finish if they try to bring every piece of information into a traditional data warehouse. In addition, many methods for analyzing new data actually create new attributes–the rate of change is simply overwhelming for a traditional enterprise data strategy to keep up.
4. Big Data Brings us into a World of Data Evolution.
The output of a big data analysis (more data– often scores or derived characteristics) is not immediately valuable until it can be proven to provide insights that help solve important business problems. Big data analysis works in the world of hypotheses, generating many candidate insights. Over time and with hard work, analyses can identify which hypotheses actually provide insights that impact the business. Once identified it takes time to socialize insights and put them to use.
Big data evolves in stages:
1. Analytic output
2. Proven insights
3. Critical guidance for business decisions
4. Automated guidance integrated into business processes
With each evolutionary step, big data insights grow in importance, use and impact. Once analytic insights are truly accepted and used to improve business decisions, they have proven enterprise value. Efforts to incorporate insights into traditional enterprise data warehouses and transactional system make sense only when big data findings shift from being unproven experiments to high value inputs into ongoing business decisions.
5. Supporting Big Data Analysis by Supporting Data Acquisition
Big data analysis requires the ability to store and ingest enormous amounts of information. IT has an excellent opportunity to partner with analytics teams to build out big data storage systems, establish standardized architectures and educate analytics teams on the aspects of applications management and deployment that apply to systems and analysis. IT's assistance will also be needed to build the structured data flows into big data environments so externally acquired data can be related to core transactional systems information, such as policy, claims, CRM, etc.
Care must be taken to recognize that early stage big data requires a very flexible sandbox for analysis and innovation. Data analysts should have a simple means for ingesting new data sources, experimenting with them and storing the experimental output of their analysis while still being able to relate it to known internal data. Efforts to standardize experimental data structures should be light–mostly guidelines–until analyses are complete and the value of the data is proven.
Typically, Hadoop Clusters (aka Data Lakes) have been adopted to store large volumes of heterogeneously structured data. Clusters leverage distributed processing and storage capacity to provide a very flexible data storage environment. The maturity of Hadoop friendly tools varies, established vendors are catching up–extending their products, while many new products have come forward. Traditional business intelligence (BI) capabilities are still evolving, with lighter versions and capabilities often described as “visualization,” but they are still working on end user self-service capabilities and data abstraction as found in mature enterprise tools. Few tools that can analyze data in Hadoop clusters support a view of end-to-end analysis deployment–this is common in the predictive analytics space in general–with many tools being used for research and completely separate tools being used for deployment. Transforming analysis from one tool set to another is very inefficient, introduces potential errors, increases costs and delays the impact of insights into the market place.
IT can help define the end-to-end deployment process, help select tools that meet analyst requirements (tools analysts actually want to work with) and design the end to end integration of big data analytic tools. Procrastinating this decision only welcomes conflict, unexpected costs and potentially failed projects.
Few organizations other than IT are designed to maintain systems environments, secure them and keep them running 24x7 or at agreed upon service levels. Analytics organizations are seldom prepared or aware of the depths of this responsibility. IT has much to offer and can partner with analytics organizations to learn where it can provide ongoing systems support.
Through architecture, systems design, tools selection and a forward thinking view of analytics application management, IT has much to offer as a partner to analytics organizations.
6. Supporting the Rapid Deployment of Analytic Insights
Rapid insight deployment is often a challenge for IT. Traditional IT systems have been developed using a waterfall method in well-defined sequence. More modern techniques, based on a very mature understanding of the end to end systems development process accelerate application development through agile approaches.
Neither IT method effectively addresses the end-to-end analysis life cycle, though it can inform steps within it.
Insights from big data analysis fall into two basic categories: insights for consideration or insights for transactional integration into existing systems. Insights for consideration are often depicted graphically or in tabular form and usually benefit from typical business intelligence capabilities. Insights for integration are calculated values and often inputs into transactional systems. Predictive applications may generate scores or new attributes and they are often delivered to transactional systems through a web services interface.
Lack of end-to-end deployment capabilities can delay deployment of insights from two months to over a year. Insights for consideration are the most easily and rapidly deployed, but suffer greatly if traditional IT BI data ingestion processes and data warehouses are used. Many strategic insights need only be deployed to a small audience, while other insights might be iteratively deployed and tested by field users until ready for broader rollout. Some insights are so important they need to be deployed immediately, yet existing BI deployment architectures may lack the flexibility to make them available quickly. These concerns argue for the deployment of an analytics managed BI environment that draws upon data big data analyses from the data lake. Depending upon tools selections, a data reporting environment may be required if current BI tools cannot access information in the lake. Ultimately, additional staged versions of the BI environment will be required to separate development, test, deploy stages, etc. It's possible that preliminary stages can be established for the analytics organization and broader deployment– as the data's importance evolves– is handed off to an enterprise IT-managed BI solution.