Building Levies to Manage Data Flood

Adam Bowen, World Wild Lead of Innovation, Delphix
602
935
200

Any time we interact digitally with anyone or anything, we generate scads of data. The global internet population is now estimated to be well over 3.2 Billion people, a number that continues to grow, as access to connectivity continues to increase. Physical items like home thermostats and pacemakers are generating data. That doesn’t even begin to touch the amount of log data generated every day by companies across the world. By 2020, experts estimate that there will be 5,200 GB of data for every person on earth.

With these numbers exponentially increasing as we continue down the road to digitization, it’s clear that we’ve only begun to scratch the surface of what can be accomplished with a holistic view of all of that data. However, the sheer volume of that data and its rapid growth can often overwhelm our ability to process the information. To take advantage of the new insights possible from all types of data, companies have to find a way to staunch the flood.

Data Flow Restriction:

Incoming data pools quickly in an organization’s data lake. Just like a real lake that is fed by rivers and streams, a data lake is also fed by data rivers and data streams (Binaries, flat files, Sybase, Oracle, MSSQL, etc.).In nature, when heavy rains fall or a waterway becomes choked, the river can quickly overflow its banks and wreak considerable mayhem and damage on the surrounding ecosystem. The same thing happens with data. When you have data coming in faster, such that you cannot read, process and analyze, the surrounding environment can also quickly become encumbered or disrupted in the form of storage exhaustion, business intelligence misinformation, application development delays, and production outages.

  The value of big data increases with the amount of data provisioned on to the platform and centralizing all of this information on a single platform 

The same effects occur when constraints like ticketing system handoff delays between departments; the inability to quickly refresh full data sets; or a cumbersome data rewind processes restrict your data flow. The issue is compounded as organizations increasingly rely on a nimble IT department following a DevOps model of application deployment. With pressure to constantly update and improve application performance, IT teams will have up to 10 non-production tributaries for every production data river. These are regularly used for Developers, Testing, QA, Staging, Training, and Business Intelligence initiatives.

The ebbs and flows of data are going to come and are often influenced by external factors beyond our control. The best you can do is to be prepared and agile enough to adapt. You must learn to swim.

Jumping into the Big Data Lake:

Traditional approaches to data integration such as ESBs, EAIs and ETL came from the age when ERPs were the heavy weights of enterprise software and they were well suited for the needs of its time. Large installation footprints, heavy reliance on network processing, and use of older languages and protocols cause them to fall short at a time when agility, above all else, is required to accommodate an ever-expanding number of data sources. To maintain the drowning middleware implementations from becoming a failing tactic, organizations need to begin migrating to more modern, cloud-based models, like data virtualization.

Just because Big Data is bigger than anything before it, doesn’t mean it has to be so unwieldy. The multiple sets of production environments could very quickly become exorbitantly expensive, but loosing the granularity on that data would be detrimental to the many different teams within a company that all need copies. Much in the same way as desktop virtualization eases the strain of PC performance caused by heavy application workloads; data virtualization alleviates the mounting tension of provisioning and large volumes of data places on network and storage systems. By virtualizing the non-production data sources, an organization can increase its storage capacity by up to 90 percent, enabling engineers to develop “flood control systems” up to two times faster to quickly adapt to changing needs. By allowing individuals access to full virtual environments, an organization ensures that its engineers will know exactly how its systems will behave when called upon in live applications, and won’t find itself drowning in untested data.

Big Data, Big Threats:

A fundamental shift in the types of data we have at our disposal is coming. Non-traditional data sources such as voice and video will expand us beyond text-based data, and we’ll be able to glean more data from communication channels such as Slack, Yammer, Messenger, and countless social media sites.

With this level of personalization in the data we not only have access to further expand the possibilities for insightful business decisions, but it also to drastically increase an organization’s data security risk. The value of big data increases with the amount of data provisioned on to the platform and centralizing all of this information on a single platform, inherently increasing the risk to that data.

Large scale hacks of personal data have crippled massive organizations such as Target and Home Depot in recent years, and they can have an even more damaging effect on smaller businesses, which are unable to bounce back quickly from such attacks. One such way to prevent the leak of sensitive customer information is through data masking–a process that hides the exact figures in a dataset and replaces it with synthetic but comparable data. This process allows IT teams to operate under the guise of “business as usual” without taking sensitive information out of its secure environment and into a place where cyber-criminals have access through a system’s vulnerabilities.

Sink or Swim:

Already, Big Data has had an immense impact on how business is done, but the fact that the amount of data continues to surge means that we’ll continue to see changes in how best to deal with the influx of information. There’s no question that a flood of data is coming. Organizations must find tools that will help them brave the storm.

Read Also

Maintaining Maximum Relevancy for Buyers and Sellers

Maintaining Maximum Relevancy for Buyers and Sellers

Zoher Karu, Vice President and Chief Data Officer, eBay
Resolving Disassociated Processing of Real-Time and Historical Data in IoT

Resolving Disassociated Processing of Real-Time and Historical Data in IoT

Konstantin Boudnik, Chief Technologist Bigdata Open Source Fellow, EPAM
Big Data, Small Business

Big Data, Small Business

Matt Laudato, Director of Big Data Analytics, Constant Contact
Today's Golden Nugget for CIOs: Smart Data

Today's Golden Nugget for CIOs: Smart Data

Carsten Linz, Global Head of Center for Digital Leadership, SAP