On Big Data & Not Being Evil
In 2013, researchers from the Massachusetts Institute of Technology detailed in Science Magazine their successful attempts to reidentify individuals in a “deidentified” genomic dataset published by the National Institutes of Health (NIH), using only publicly accessible data on the Internet. This and other well-publicized instances suggest that “deidentified” data may never truly be so. Currently, NIH grant solicitations require applicants to attest that data will be “fully deidentified.” But the power of combining datasets (Big Data) has gotten ahead of the rules.
The first wave of Big Data to hit biomedicine was in 2003 with the first sequencing of the human genome, which had taken a decade to complete and cost over a billion dollars. Farsighted individuals, however, were already warning of a “tsunami of data” bearing down upon ill-equipped infrastructures. The cost of sequencing has dropped exponentially in recent years, outpacing Moore’s law. In January 2014, the instrument manufacturer Illumina announced a new sequencer with the claim that it can sequence human genomes at a cost of $1,000 for each sequence, in the form of about 250 gigabytes of raw data-storage and analysis not included. Pundits argued the semantics behind this claim, but as those of us in academic medical centers had known for some time, the tsunami is upon us. Using any sequencer, it is assumed that the user already has a robust IT infrastructure of storage, high-performance computing resources, and network bandwidth. Storage in particular is a critical aspect since discarding data once analyzed is not an option, funding agencies and medical journals require researchers to make it available on request. Allocating petabytes of storage for rarely-accessed data indefinitely is not a palatable option either.
Biomedicine is not waiting for us to discover a resolution, it is racing ahead. Organizations such as the Institute for Systems Genetics at NYU Langone Medical Center are establishing biology production lines with the potential to generate petabyte-scale volumes of new data annually. Furthermore, healthcare is preparing for whole-genome sequencing, previously a research activity,to become a routine part of patient care. As a result, genome sequence data will be part of every patient record within the next few years.
Big Data: Separating the Hype from Reality in Corporate Culture
Maintaining Maximum Relevancy for Buyers and Sellers
Building Levies to Manage Data Flood
Resolving Disassociated Processing of Real-Time and Historical Data in IoT
By Pete V. Sattler, VP-IT & CIO, International Flavors &...
By Benjamin Beberness, CIO, Snohomish County PUD
By Gary Watkins, CIO of IT Shared Services, KAR Auction...
By Tonya Jackson, VP Global Supply Chain, Lexmark
By Chad Lindbloom, CIO, C.H. Robinson
By Ryan Fay, CIO, ACI Specialty Benefits
By Kris Holla, VP& CSO, Nortek, Inc.
By Shawn Wiora, CIO & CISO, Creative Solutions In Healthcare
By Michael Alcock, Director-CIO Executive Programs &...
By Jeff Bauserman, VP-Information Systems & Technology,...
By Wes Wright, CTO, Sutter Health
By Peter Ambs, CIO, City of Albuquerque
By Mark Ziemianski, VP of Business Analytics, Children's...
By Jonathan Alboum, CIO, The United States Department of...
By Ryan Billings, MS, MBA, Executive Director, Digital...
By Christina Clark, Managing Principal, Cresa
By Evan Abrams, Associate, Steptoe & Johnson LLP
By Holly Baumgart, Vice President-Information Technology,...
By Melissa Douros, Director of Digital Product Management,...
By Andrew Palmer, SVP & Chief Information Officer, U.S....