On Big Data & Not Being Evil
In 2013, researchers from the Massachusetts Institute of Technology detailed in Science Magazine their successful attempts to reidentify individuals in a “deidentified” genomic dataset published by the National Institutes of Health (NIH), using only publicly accessible data on the Internet. This and other well-publicized instances suggest that “deidentified” data may never truly be so. Currently, NIH grant solicitations require applicants to attest that data will be “fully deidentified.” But the power of combining datasets (Big Data) has gotten ahead of the rules.
The first wave of Big Data to hit biomedicine was in 2003 with the first sequencing of the human genome, which had taken a decade to complete and cost over a billion dollars. Farsighted individuals, however, were already warning of a “tsunami of data” bearing down upon ill-equipped infrastructures. The cost of sequencing has dropped exponentially in recent years, outpacing Moore’s law. In January 2014, the instrument manufacturer Illumina announced a new sequencer with the claim that it can sequence human genomes at a cost of $1,000 for each sequence, in the form of about 250 gigabytes of raw data-storage and analysis not included. Pundits argued the semantics behind this claim, but as those of us in academic medical centers had known for some time, the tsunami is upon us. Using any sequencer, it is assumed that the user already has a robust IT infrastructure of storage, high-performance computing resources, and network bandwidth. Storage in particular is a critical aspect since discarding data once analyzed is not an option, funding agencies and medical journals require researchers to make it available on request. Allocating petabytes of storage for rarely-accessed data indefinitely is not a palatable option either.
Biomedicine is not waiting for us to discover a resolution, it is racing ahead. Organizations such as the Institute for Systems Genetics at NYU Langone Medical Center are establishing biology production lines with the potential to generate petabyte-scale volumes of new data annually. Furthermore, healthcare is preparing for whole-genome sequencing, previously a research activity,to become a routine part of patient care. As a result, genome sequence data will be part of every patient record within the next few years.
Getting the Most out of Big Data
Big Data: Separating the Hype from Reality in Corporate Culture
Maintaining Maximum Relevancy for Buyers and Sellers
Building Levies to Manage Data Flood
By Tom Farrah, CIO & SVP, Dr Pepper Snapple Group
By George Evans, CIO, Singing River Health System
By John Kamin, EVP and CIO, Old National Bancorp
By Phil Jordan, CIO, Telefonica
By Elliot Garbus, VP-IoT Solutions Group & GM-Automotive...
By Dennis Hodges, CIO, Inteva Products
By Bill Krivoshik, SVP & CIO, Time Warner Inc.
By Gregory Morrison, SVP & CIO, Cox Enterprises
By Alberto Ruocco, CIO, American Electric Power
By Sam Lamonica, CIO & VP Information Systems, Rosendin...
By Sven Gerjets, SVP-IT, DIRECTV
By Marie Blake, EVP & CCO, BankUnited
By Lowell Gilvin, Chief Process Officer, Jabil
By Walter Carvalho, VP & Corporate CIO, Carnival Corporation
By Mary Alice Annecharico, SVP & CIO, Henry Ford Health System
By Bernd Schlotter, President of Services, Unify
By Bob Fecteau, CIO, SAIC
By Jason Alan Snyder, CTO, Momentum Worldwide
By Jim Whitehurst, CEO, Red Hat
By Marc Jones, Distinguished Engineer, IBM Cloud Infrastructure