Andy Palmer, Co-founder and CEOToday, enterprises want to be able to use all the data at their disposal for analytics be it internal data sources, external public data sources or feeds from the Internet of Things. The challenge lies in connecting these diverse, siloed data sources and enriching them so that they can be quickly put to work for the business. Data-Tamer, a Cambridge, Mass. based company, is working on solving this problem.
Founded in 2013 by Michael Stonebraker, a renowned computer scientist specializing in database research, and Andy Palmer, a serial entrepreneur who has started many technology companies, Data-Tamer has its roots in Stonebraker’s research conducted at MIT with researchers from UC Berkeley, Brown Universit y, Brandeis, and Qatar Computing Research Institute (QCRI). The research explores the use of machine learning algorithms to help a company connect the information from different data sources, along with asking a human for help when the algorithmic results have a low confidence. Altogether, the research was aimed at enabling organizations to broadly integrate and “curate” many existing and future data sources efficiently at scale. Data-Tamer is now applying this research for solving the problem of enterprises trying to connect a large variety of internal and external data sources.
The young company is fast gaining traction across market segments including IT services, pharmaceuticalsand telecommunications. While most of the current approaches to connect and curate data are costly and do not efficiently address the scale problem, Data-Tamer stands out with its unique solution, which can address any type of curation problem. For example, if a web aggregator requires the curation of 80,000 URLs and another company has the problem of curating 8000 spreadsheets. “At this scale, data curation cannot be a manual effort, but must entail machine learning approaches with a human assist when necessary,” says Palmer, the CEO of the company.
We’re working at the intersection of cutting-edge academic research and focused commercialization, building great products and delighting customers
Moreover, all curation must be incremental, as new data sources are uncovered and must be curated over time. “We have run Data-Tamer on dozens of real-world enterprise curation problems, and it has brought down curation cost by about 90 percent, relative to other currently deployed production software,” Palmer adds.
A large biopharmaceutical company with hundreds of decentralized and relatively autonomous laboratories wanted an integrated view of all the results data across their laboratories. Instead of manually curating more than 10K spreadsheets, the IT organization turned to Data-Tamer to apply advanced machine learning algorithms alongside of targeted expert guidance to deliver an integrated view of curated experimental results across many hundreds of disparate laboratories and thousands of individual scientists. This capability enabled the IT organization to facilitate data analysis across many otherwise disparate organizations and scientific disciplines.
Data-Tamer is still in an early stage but according to Palmer, “It’s a logical time for the company to evolve as people are starting to appreciate the product. With two years incubating the project at MIT and another two years to begin commercializing it, we are really excited to launch the Data-Tamer product publicly over the next three months.”