The company’s big data strategy, entity information management, and governance services generate business insights from the variety of data with proper master data management and data governance strategies. “We enable our clients to efficiently ingest and aggregate data from various sources using Hadoop & Spark ecosystem and other modern techniques—Talend and Camel for the ETL, mediation, and routing,” says Sudharsan Madabusi, President and CEO, Treselle Systems. Treselle has expertise in data visualization technologies and can perform time-series analysis, geo-spatial analysis, forecasting, classification, clustering, graph-based visualization and back-testing analysis. The company also has extensive experience with multiple NoSQL databases, Big Data SQL Engine, Big Data Cloud computing, Statistical & Quantitative analysis, Text Search & NLP, and Big Data Quality Assurance.
Treselle’s Big Data R&D team constantly explores technologies that solve interesting use cases for its clients. For example, Treselle implemented OrientDB as a Polyglot persistence mechanism, for one of their clients, due to its multi-model capabilities that stores data in document database but still has relationship between documents to provide graphing capabilities. Furthermore, Treselle recommended Apache Drill, a flexible query execution framework that has self-describing data exploration behavior across different data stores to avoid too much data movement & syncing issues causing staleness across MongoDB, MySQL, S3, and flat files by keeping data at source.
We enable our clients to efficiently ingest, transform, aggregate, and gain better insights from variety of data sources
In one instance, a client from healthcare sector, processing more than 4000 data sources was struggling with ineffective entity identification, disambiguation, and linkage, since same information about healthcare practitioners came in various formats. The client’s existing data processing technique was too tedious, slow and error-prone to processing large data sets. The client required a system, which can effectively process their data and integrate with their data management platform, identifying and linking the entities within their massive 40 million nodes and relationships managed in Neo4j. Treselle’s Big Data engineering team utilized various advanced technologies and integration points to perform data manipulation, munging, cleansing, and transformation. The team used Hadoop ecosystem with Pig and other user defined functions to do the data transformation in batch mode, R’s text engineering capabilities to perform cleansing, entity identification and linking, and integrated these backend systems with OpenRefine GUI with custom GREL macro expressions to provide excel-like features on the web for the client’s data science team. This reduced their data scientist’s time from days to hours to perform various data munging capabilities and enriched the data with client’s internal APIs.
The average organization today collects more data than ever before, and the variety of data types that are stored, managed, and analyzed has increased exponentially. Engineering talent with different data skills is needed to ingest, transform, aggregate, model, analyze and create insights. “We help businesses by providing the talent to build strong data teams that include data engineers, modelers, scientists, and BI analysts,” concludes Madabusi.