
Big Data: More Grey than Black or White


Anthony Scriffignano, Ph.D., SVP, Chief Data Scientist, Dun & Bradstreet
Let’s face it: we take digital everything for granted, often treating insights at face value without questioning underlying assumptions. Recently, I had a great dining experience based on an “unfavorable” online restaurant review citing “very spicy food.” I love spicy food. A filtered “favorable” search would have excluded this choice. The seemingly objective “unfavorable” rating was in reality more nuanced in a way that could be interpreted only by looking at the rating. If we struggle with this sort of data dichotomy, imagine the plight of developing an algorithm to scan millions of pieces of nuanced data that come from multiple sources. We increasingly trust such algorithms every day, sometimes with chilling implications.
We sit at the cusp of a new era, where exciting new types of data will be used in ways that we have yet to understand
In many ways, we inappropriately act as if outcomes are either right or wrong, knowing that the world around us doesn’t behave like that (“does this luggage contain anything dangerous?”). There is no defendable premise that we can simply scale our black and white approach to data from years ago to address the dynamic nature of data that is really neither black nor white.
Computable Data: Sound Waves and Singing Angels
Our brains create computable data (i.e. “consumable by an algorithm”) just by listening to music. The nuance of interpreting pressure waves comes from the brain’s interpretation, making music “computable.” The brain also curates, drawing on past experiences, learned and innate responses, and many other pieces of data to derive higher-order meaning. But it all starts with making data computable.
The first basic computational scenario to consider is when the data is ingested for the first time from the “real world,” like taking a picture. Light striking the lens of a digital camera is raw data. The optics in the lens transforms that data first, although still in analog form. As the data gets digitized internally, it becomes computable (insert the sound of singing angels here). The magic of color correction, image detection, and light balancing comes to us courtesy of algorithms processing ingested data while simultaneously combining other previously-curated information (for example, attaching metadata such as time, date, and location).
One of the most overlooked truths in today’s world is the rise in the availability of computable data. Failure to consider the new kinds of questions we can ask with all of these new types of data could be the single greatest mistake we can make.
Unstructured Data: Playgrounds and Adverbs
We have become numb to the dramatically increasing amount of data available to an enterprise. Many organizations are struggling to deal with the quantity and variety of information they already have, let alone considering new sources of data. It is, therefore, especially tempting to ignore data that is not conveniently packaged (i.e. digitized, with metadata), considering it “unstructured.” A great way to ease into the phenomenon can be seen by looking at a playground.
Few things initially look more unstructured than a school playground during recess. Closer inspection reveals the first hint at underlying organization and governance: a playground monitor watches to make sure that certain things do or don’t happen. Further inspection reveals lines and numbers painted on the ground, implying some sort of game rules. One might notice children of different ages, genders, or cultures exhibiting different inferred social norms. All of a sudden, what seemed to be unstructured seems a little more structured. This is a great analogy for much “unstructured” data.
To make classic “unstructured data” such as social media text more computable, we might inspect the text and decompose it via entity extraction (extracting nouns, verbs, adverbs.), sentiment analysis (attributing the mood), and language detection (ascertaining primary language). Like a brain processing music or a camera processing an image, there will be information loss. The trick is to understand the implications of that loss and to do something consistent with the extracted information. The curation step, combining the extracted information with other previously-computed information is the key to deriving meaningful insight.
Unstructured data may be an oxymoron. In almost all cases, there is some treatment that can be applied to derive some higher-order meaning from a collection of non-random data.
Methods: Standard Deviations and Standards for Deviation
Transforming data into computable information and inferring structure allows one to consider nuance via methods to create, transform or understand data. Common methods are measuring central tendency (For example, mean) or dispersion (For example, standard deviation). Another common method is regression, using past (longitudinal) data to estimate a relationship in the form of a prediction equation. These methods underpin much of today’s interpretation of numerical (digital) data. They are, however, notoriously dangerous when considering the types of new data and variation underlying subjective decisions such as whether or not a piece of luggage contains something “dangerous,” especially since the underlying behaviors change as they are being observed over time.
The good news is that there are many other methods that can help us look at this kind of question in a scientific way. Examples include machine learning and heuristic evaluation. There are also emerging technologies, such as quantum computing, that will allow us to ingest, organize, manipulate and understand data that is not binary.
The challenge of problem formulation is to always question why the method we select is the best method for the question at hand and the data available.
Computable data, treatment of unstructured data, and method selection are only three, but a very important three considerations when addressing nuance in data. The journey to understanding data in richer, deeper ways is daunting, but enormously exciting and rewarding. We sit at the cusp of a new era, where exciting new types of data will be used in ways that we have yet to understand. There is no better time to be exploring a world of data that is neither black nor white.
See Also:
ON THE DECK
Featured Vendors
Next Level Business Services (NLB): Applying Digital Transformation to Create Supply & Service Value Chains of the Future
Gerber Technology: Reshaping the Dynamics of the Fashion & Apparel and Flexible Materials Industries
FileFacets: A One-stop Solution for Locating and Identifying Data Across the Enterprise" title="Jennifer Nelson, VP, Sales & Marketing" style="float:left; margin-right:10px; margin-bottom:20px;" width="60px" height="50px">
FileFacets: A One-stop Solution for Locating and Identifying Data Across the Enterprise
Infoworks: Dynamic Data Warehousing on Hadoop that Automatically Ingests and Organizes Enterprise Data for All Use-cases
ThetaRay: Advanced Data Analytics Provide an Enhanced Security Layer to Combat Bank Fraud and Cybercrime
VentureSoft Global: Robust Big Data Solutions for Customer, Product Profitability and Operational Efficiency
Absolut-e Data Com BizStats – Leveraging Artificial Intelligence To Extract The True Potential Of Data
Relational Solutions, Inc.: Delivers Enterprise Demand Signal Repositories to the Consumer Goods Ind
Emagine International: Adaptive Contextual Marketing Platform for Personalized Customer Interactions
Cygnus Professionals: Translate Big Data into Actions: An Analytics Platform Transforming Enterprise
EDITOR'S PICK
Essential Technology Elements Necessary To Enable...
By Leni Kaufman, VP & CIO, Newport News Shipbuilding
Comparative Data Among Physician Peers
By George Evans, CIO, Singing River Health System
Monitoring Technologies Without Human Intervention
By John Kamin, EVP and CIO, Old National Bancorp
Unlocking the Value of Connected Cars
By Elliot Garbus, VP-IoT Solutions Group & GM-Automotive...
Digital Innovation Giving Rise to New Capabilities
By Gregory Morrison, SVP & CIO, Cox Enterprises
Staying Connected to Organizational Priorities is Vital...
By Alberto Ruocco, CIO, American Electric Power
Comprehensible Distribution of Training and Information...
By Sam Lamonica, CIO & VP Information Systems, Rosendin...
The Current Focus is On Comprehensive Solutions
By Sergey Cherkasov, CIO, PhosAgro
Big Data Analytics and Its Impact on the Supply Chain
By Pascal Becotte, MD-Global Supply Chain Practice for the...
Technology's Impact on Field Services
By Stephen Caulfield, Executive Director, Global Field...
Carmax, the Automobile Business with IT at the Core
By Shamim Mohammad, SVP & CIO, CarMax
The CIO's role in rethinking the scope of EPM for...
By Ronald Seymore, Managing Director, Enterprise Performance...
Driving Insurance Agent Productivity with Mobile and Big...
By Brad Bodell, SVP and CIO, CNO Financial Group, Inc.
Transformative Impact On The IT Landscape
By Jim Whitehurst, CEO, Red Hat
Get Ready for an IT Renaissance: Brought to You by Big...
By Clark Golestani, EVP and CIO, Merck
Four Initiatives Driving ECM Innovation
By Scott Craig, Vice President of Product Marketing, Lexmark...
Technology to Leverage and Enable
By Dave Kipe, SVP, Global Operations, Scholastic Inc.
By Meerah Rajavel, CIO, Forcepoint
AI is the New UI-AI + UX + DesignOps
By Amit Bahree, Executive, Global Technology and Innovation,...
Evolving Role of the CIO - Enabling Business Execution...
By Greg Tacchetti, CIO, State Auto Insurance
Read Also
How Digital Experience Is Of Growing Importance To P&C Insurers And...
What It Truly Means For IT Security To Bea Business Enabler
Digital Transformation 2 Requires a CIO v2.x
Leverage ChatGPT the Right Way through Well-Designed Prompts
Water Strategies for Climate Adaption
Policy is a Key Solution to Stopping Packaging Waste
