On Big Data & Not Being Evil
In 2013, researchers from the Massachusetts Institute of Technology detailed in Science Magazine their successful attempts to reidentify individuals in a “deidentified” genomic dataset published by the National Institutes of Health (NIH), using only publicly accessible data on the Internet. This and other well-publicized instances suggest that “deidentified” data may never truly be so. Currently, NIH grant solicitations require applicants to attest that data will be “fully deidentified.” But the power of combining datasets (Big Data) has gotten ahead of the rules.
The first wave of Big Data to hit biomedicine was in 2003 with the first sequencing of the human genome, which had taken a decade to complete and cost over a billion dollars. Farsighted individuals, however, were already warning of a “tsunami of data” bearing down upon ill-equipped infrastructures. The cost of sequencing has dropped exponentially in recent years, outpacing Moore’s law. In January 2014, the instrument manufacturer Illumina announced a new sequencer with the claim that it can sequence human genomes at a cost of $1,000 for each sequence, in the form of about 250 gigabytes of raw data-storage and analysis not included. Pundits argued the semantics behind this claim, but as those of us in academic medical centers had known for some time, the tsunami is upon us. Using any sequencer, it is assumed that the user already has a robust IT infrastructure of storage, high-performance computing resources, and network bandwidth. Storage in particular is a critical aspect since discarding data once analyzed is not an option, funding agencies and medical journals require researchers to make it available on request. Allocating petabytes of storage for rarely-accessed data indefinitely is not a palatable option either.
Biomedicine is not waiting for us to discover a resolution, it is racing ahead. Organizations such as the Institute for Systems Genetics at NYU Langone Medical Center are establishing biology production lines with the potential to generate petabyte-scale volumes of new data annually. Furthermore, healthcare is preparing for whole-genome sequencing, previously a research activity,to become a routine part of patient care. As a result, genome sequence data will be part of every patient record within the next few years.
Getting the Most out of Big Data
Big Data: Separating the Hype from Reality in Corporate Culture
Maintaining Maximum Relevancy for Buyers and Sellers
Building Levies to Manage Data Flood
By Chris Tjotjos, VP, Cisco Solutions Practice, Black Box...
By Laura Jackson, Sr. Manager-Risk Management, ABS Consulting
By Jason Cradit, VP of Information Systems, Willbros Group
By Steve Garske, Ph.D., Senior Vice President & Chief...
By Roman Trakhtenberg, CEO, Luxoft
By Renee P Wynn, CIO, NASA
By Mike Morris, CIO, Legends
By Louis Carr, Jr., CIO, Clark County
By Andrew Macaulay, CTO, Topgolf Entertainment Group
By Dominic Casserley, President and Deputy CEO, Willis...
By Dave Nelson, SVP-Portfolio Lead, Avanade, Inc.
By Michael Cross, SVP & CIO, CommScope Holding Company Inc.
By Pauly Comtois, VP DevOps, Hearst Business Media
By Dan Adam, CIO, Extreme Networks
By Matt Schlabig, CIO, Worthington Industries
By David Tamayo, CIO, DCS Corporation
By Scott Cardenas, CIO, City and County of Denver
By Marc Kermisch, VP & CIO, Red Wing Shoe Co.
By Brian Drozdowicz, VP, Digital Services, Siemens...
By Les Ottolenghi, EVP and CIO, Caesars Entertainment