Blog.

04.06.18

Real-time data: how to get it right.

Intro

In business, as with data, time is of the essence. The faster you can get insights, the more quickly you can make critical business decisions. That’s where real-time data analysis can really appeal. However, it’s hard for businesses to really get to grips with real-time data – especially the management and architecture side of things. Plus, if you’re only just realising the value of your data, trying to go real-time is much like trying to sprint before you can walk.

Real-time data is not for the faint-hearted. To make it work effectively, you need to have a robust architecture in place and efficient processes. For businesses willing to put in the groundwork, it does pay off in the long-run. Indeed, many in the industry have argued that real-time will be standard for most analysis in the future.

Defining real-time data

Using real-time data effectively requires you to understand it. That’s a tricky one, as every CIO and data leader will have different definitions of what real-time actually means to them. The phrase ‘real-time’ might even be a bit of a misnomer. Often it refers to near real-time.

For some, real-time means getting insights from data fast enough to enact changes. Others may say it’s literally processing data and getting insights almost instantly. Others still argue that no matter how fast you process data, some latency will always remain… so real-time doesn’t actually exist at all.

Putting that existential crisis aside, currently, there are a couple of ways to analyse data in (almost) real-time:

Streaming: this involves ingesting and gaining insights from high-velocity data to detect patterns, identify potential issues and respond in real-time. Market traders who analyse stock market data used this method.
Individual record look-up: this requires instant access to individual records across multiple real-time data points (like a person’s name and transactional history). Data that’s updated in real-time offers a more accurate overview of operations than historical data. A manufacturer, for example, could use this to check current stock-levels.

The technique you choose depends primarily on your use case, and this is crucial. You should also consider the resources (such as your team’s current capabilities, technology and budget) you have to commit to real-time analysis. Doing it well involves a lot of time and resources to get it right.

Real-time use cases

The rise of the Internet of Things (IoT), machine learning, the cloud and smart cities all plug into real-time analysis. Through real-time data, retailers can use beacon technology to measure footfall and even identify customers walking through their doors. Because it’s real-time, the retailer will then be able to deliver promotions to its customers at the very moment they are in-store. Companies like PayPal rely on real-time analytics to spot suspicious patterns and fraud. As smart home technology and IoT increase in popularity, so too will the appeal of real-time data. Many businesses are already considering its potential.

“75% of survey respondents recently stated that untimely data has inhibited business opportunities, with 27% indicating it has negatively impacted productivity”

(Intersystems Study 2017)

Real-time analysis helped an online retailer cut down on abandoned transactions by 80%. Instead of using historical data to identify potential bottlenecks and issues, real-time data allowed the retailer to spot issues with checking-out. Where problems were found, agents contacted customers to assist them with check-out. This caused abandoned transactions to fall from 5% to 1%.

In agriculture, sensors collect and send environmental data in real-time. This allows farmers to respond quickly to needs and problems in the field. Real-time data has led to more efficient agriculture with greater yields, higher productivity and less negative environmental impact.

Relying on lagging metrics is a common problem for many businesses when using data. Forecasting based on historical data isn’t as accurate as using real-time data. Yesterday’s success doesn’t guarantee future triumph.

Do you really need real-time?

There are many reasons to consider real-time data analysis. However, if your current capabilities require a massive upgrade, then you should consider whether resources will be better spent on other data projects. Without clear goals for real-time data analysis, you’ll fail to derive good value from it. Do you really need to use real-time data – or are you getting caught up in the hype?

The challenges of real-time data management

The biggest hurdle that organisations face when using real-time data is poor infrastructure and a lack of skills. A lot of current data sets are already in a stream format suitable for real-time analysis (think stock market data and social media feeds). However, legacy systems are unable to store and process this, so an investment in updating the tech stack is required. A data team may also need training in the new systems.

Most databases currently used by businesses aren’t able to process real-time data. The legacy system of Extract, Transfer, and Load (ETL) is designed to run by batch processes. Which is incompatible with real-time data analysis. Worse still, by the time most data gets to a warehouse, the opportunity for real-time processing is lost. Plus, more than half of developers are still getting to grips with the technology needed for real-time.

Technology for real-time analysis

There are two main types of data architecture able to support real-time analysis, Lambda and Kappa.

Lambda uses a hybrid model that analyses historical data and newly arrived data at the same time. Without getting too technical, the architecture involves two layers in order to process the two types of data – batch and speed. This allows it to process both, but it is slightly more complex, less responsive, and requires more maintenance compared to Kappa.

Kappa takes in streams of data from feeds like the Internet of Things and social media. This data is placed in temporary storage where it is then ingested into an engine like Apache Spark or Flink. It’s worth researching these different engines, as each one works slightly different and are suitable for different use cases.

Do your homework

Real-time data has its benefits – to really unlock its value you need to read up on it. Don’t be tempted to take shortcuts, no race was truly won by cheating. The same goes for your real-time data. Take the time to research all your options and the best technology solutions for your goals. Consider whether you really need real-time at this point – it really comes down to solid use cases.

Real-time analysis is likely to be the gold standard in the future. But for now, a hybrid approach where you use historical analysis for some insights, and real-time for another, will offer the best value and quickest results.