
Data Lake: Building a Bridge between Technology and Business


Shaun Rankin, SVP Data Management, Citizens Bank
Is Data a Technology or a Business function? Reviewing trade-offs, such as “who owns the data?”, “who can edit data in production?”, or even “where should the CDO function reside”, tend to look past a fundamental principal of the data industry. Effectively managing data is a two-way street. It is not solely Business, nor Technology. Data is an enterprise asset that exists where the handshake between business and technology occurs. Data Lakes and open source tools are erasing archaic frameworks used for decades in establishing policy, process, infrastructure, application design, controls, etc. Those who influence the “hearts and minds” of the organization to reset some of these effective, but outdated practices, will greatly accelerate the timeline to value from the data lake. Here are several decision points that may require a re-examination to increase momentum.
Production Environment
A production environment, where the business operates to meet the needs of its customers, requires stability and controls to support the business function, for which it is designed. This level of stability will help support teams succeed in achieving required service level agreements.
A data lake supporting analytics is an organic, iterative, environment supporting the testing of hypothesis. Once the hypothesis is proven, the “EUREKA” moment, that insight needs to be shared. Analysts can additionally bring their own data and mix it with the production data. The needs change on a daily basis. The process is not definable to the level required when designing a stable production application.
Establishing self-service analytics with sandboxes in the production environment can help bridge between traditional production data environment and the organic data blending required to support analytics. Allow the analysts to load data into a dedicated workspace that can be joined to the production data in the data lake. Self-service capabilities allow for the capture, blending and sharing of information without running a software development project. Some mature organizations have implemented business driven schedules for implementing repeatable processes.
Controls can be put in place on the sandboxes, 90 day expiration, charge-back methods, etc. to help establish outer boundaries for compute, storage, and network traffic.
A marketing initiative purchases data, for analyzing price sensitivity or market basket offerings within a specific segment. Once they find a profitable segment, the EUREKA moment, the marketing process begins across production channels. Analytics on production customer data, blended with purchased market data, sending leads to production channels.
​Segmenting your data lake to support different analytic needs will be helpful in driving success of the consumer
Provisioning
Controlling access to data tends to occur within technology, sometimes as part of a project, following information security guidelines. This has a tendency to design access control from an application, or from a source perspective. This model falls down in several key areas.
- Technology will design access within a project and may not have a deep enough understanding of business utilization to appropriately assess the inherent risk. This drives towards tighter controls than are necessary when balanced with the level of risk.
- Justification processes are performed at an individual basis which leads to inconsistent entitlements.
- Multitude of sources. Data Lakes need information from many sources. If each source has a different provisioning process, it could complicate provisioning in the lake.
One concept for addressing this is to ensure the appropriate first line of defense is defined. For most companies, first line of defense is the business or department directly, but not IT. This reduces, but doesn’t eliminate, the role that technology plays and can enable a more balanced approach to risk and control. This also elevates the accountability of the business in defining the right level of risk.
A second opportunity is to change the design from where data is sourced, to where it is consumed. This is commonly known as a ROLE based provisioning process. Aligning access control with a specific department more closely ties the data needs to relevant policies and justifications. Role based design also eliminates the complexities for provisioning data from multiple sources.
Practical Application:
By having a role called FINANCE, all GL, Transaction, and Account related information can be provisioned. Most roles in finance would have a consistent business need to know. Applicable relevant policies, ie: Sarbanes-Oxley would be applied to anyone with this role. On the other hand a second role called MARKETING, could require masked customer data, with relevant Privacy or GLBA policies implemented with this role.
One Size Does Not Fit All
Each usage of data has a unique context. The performance, quality, ease of use, cost, speed to decision, all can vary. Regulatory reporting may require a GL reconciled data mart with full lineage from when data is acquired to when it is put on a report. Hurricane Katrina is coming, do we have any customers, or inventory, or facilities that will be impacted, has a different level of urgency. The data quality can be best at hand.
Segmenting your data lake to support different analytic needs will be helpful in driving success of the consumer. Many analysts are used to data wrangling with data from disparate sources. Some information consumers need a more structured organization of the data assets.
Plan for a variety of tiers, perhaps starting with 3, RAW, FULLY ORGANIZED, and in between. Be prepared to allow for using data across sandbox, raw, fully organized etc.
ON THE DECK
Featured Vendors
Next Level Business Services (NLB): Applying Digital Transformation to Create Supply & Service Value Chains of the Future
Gerber Technology: Reshaping the Dynamics of the Fashion & Apparel and Flexible Materials Industries
FileFacets: A One-stop Solution for Locating and Identifying Data Across the Enterprise" title="Jennifer Nelson, VP, Sales & Marketing" style="float:left; margin-right:10px; margin-bottom:20px;" width="60px" height="50px">
FileFacets: A One-stop Solution for Locating and Identifying Data Across the Enterprise
Infoworks: Dynamic Data Warehousing on Hadoop that Automatically Ingests and Organizes Enterprise Data for All Use-cases
ThetaRay: Advanced Data Analytics Provide an Enhanced Security Layer to Combat Bank Fraud and Cybercrime
VentureSoft Global: Robust Big Data Solutions for Customer, Product Profitability and Operational Efficiency
Absolut-e Data Com BizStats – Leveraging Artificial Intelligence To Extract The True Potential Of Data
Relational Solutions, Inc.: Delivers Enterprise Demand Signal Repositories to the Consumer Goods Ind
Emagine International: Adaptive Contextual Marketing Platform for Personalized Customer Interactions
Cygnus Professionals: Translate Big Data into Actions: An Analytics Platform Transforming Enterprise
EDITOR'S PICK
Essential Technology Elements Necessary To Enable...
By Leni Kaufman, VP & CIO, Newport News Shipbuilding
Comparative Data Among Physician Peers
By George Evans, CIO, Singing River Health System
Monitoring Technologies Without Human Intervention
By John Kamin, EVP and CIO, Old National Bancorp
Unlocking the Value of Connected Cars
By Elliot Garbus, VP-IoT Solutions Group & GM-Automotive...
Digital Innovation Giving Rise to New Capabilities
By Gregory Morrison, SVP & CIO, Cox Enterprises
Staying Connected to Organizational Priorities is Vital...
By Alberto Ruocco, CIO, American Electric Power
Comprehensible Distribution of Training and Information...
By Sam Lamonica, CIO & VP Information Systems, Rosendin...
The Current Focus is On Comprehensive Solutions
By Sergey Cherkasov, CIO, PhosAgro
Big Data Analytics and Its Impact on the Supply Chain
By Pascal Becotte, MD-Global Supply Chain Practice for the...
Technology's Impact on Field Services
By Stephen Caulfield, Executive Director, Global Field...
Carmax, the Automobile Business with IT at the Core
By Shamim Mohammad, SVP & CIO, CarMax
The CIO's role in rethinking the scope of EPM for...
By Ronald Seymore, Managing Director, Enterprise Performance...
Driving Insurance Agent Productivity with Mobile and Big...
By Brad Bodell, SVP and CIO, CNO Financial Group, Inc.
Transformative Impact On The IT Landscape
By Jim Whitehurst, CEO, Red Hat
Get Ready for an IT Renaissance: Brought to You by Big...
By Clark Golestani, EVP and CIO, Merck
Four Initiatives Driving ECM Innovation
By Scott Craig, Vice President of Product Marketing, Lexmark...
Technology to Leverage and Enable
By Dave Kipe, SVP, Global Operations, Scholastic Inc.
By Meerah Rajavel, CIO, Forcepoint
AI is the New UI-AI + UX + DesignOps
By Amit Bahree, Executive, Global Technology and Innovation,...
Evolving Role of the CIO - Enabling Business Execution...
By Greg Tacchetti, CIO, State Auto Insurance
Read Also
Challenges that Compliance Officers face Today
Risk Exposures and How to Tackle them
Creativity Overcomes Scarcity
Putting The Customer At The Centre Of The Energy Transition
The Rise of Algorithmic Trading In The Power Sector
How to Align the Business and Operating Models of an Insurance Company
