Sub-second analytical BI time to value still a pipe dream

Internet search engines with instant query responses may have misled enterprises into believing all analytical queries should deliver split second answers.

With the advent of Big Data analytics hype and the rapid convenience of internet searches, enterprises might be forgiven for expecting to have all answers to all questions at their fingertips in near real time.


Unfortunately, getting trusted answers to complex questions is a lot more complicated and time consuming than simply typing a search query. Behind the scenes on any internet search, a great deal of preparation has already been done in order to serve up the appropriate answers. Google, for instance, dedicates vast amounts of high-end resources and all of its time to preparing the data necessary to answer a search query instantly. But even Google cannot answer broad questions or make forward-looking predictions. In cases where the data is known and trusted, the data has been prepared and rules have been applied, and the search parameters are limited, such as with a property website, almost instant answers are possible, but this is not true BI or analytics.

Within the enterprise, matters become a lot more complicated.  When the end-user seeks an answer to a broad query – such as when a marketing firm wants to assess social media to find an affinity for a certain range of products over a 6-month period – a great deal of ‘churn’ must take place in the background to deliver answers. This is not a split-second process, and it may deliver only general trends insights rather than trusted, quality data that can serve as the basis for strategic decisions.

When the end user wishes to do a query and is given the power to process their own BI/Analytics, lengthy churn must take place. Every time a query, report or instance of data access is converted into useful BI/Analytical information for end-consumers, there is a whole lot of preparation work to be done along the way : i.e. identify data sources>  access> verify> filter> pre-process>  standardize> lookup> match> merge> de-dup> integrate> apply rules> transform> preprocess> format> present> distribute/channel.

Because most queries have to traverse, link and process millions of rows of data and possibly trillions of words from within the data sources, this background churn could take hours, days or even longer.

A recent TWDI study found that organisations are dissatisfied with the time it takes for the chain of processes involved for BI, analytics and data warehousing to deliver valuable data and insights to business users. The organisations attributed this, in part, to ill-defined project objectives and scope, a lack of skilled personnel, data quality problems, slow development or inability to access all relevant data.

The problem is that most business users are not BI experts and do not all have analytical minds, so the discover and report method may be iterative (therefore slow) and in many cases the outputs/results are not of the quality expected. The results may also be inaccurate as data quality rules may not have been applied, and data linking may not be correct, as it would be in a typical data warehouse where data has been qualified and pre-defined/derived. In a traditional situation, with a structured data warehouse where all the preparation is done in one place, and once only, and then shared many times, supported by quality data and predefined rules, it may be possible to get sub-second answers. But often even in this scenario, sub-second insights are not achieved, since time to insight also depends on properly designed data warehouses, server power and network bandwidth.

Users tend to confuse search and discover on flat raw data that’s already there, with information and insight generation at the next level. In more complex BI/Analytics, each time a query is run, all the preparation work has to be done from the beginning and the necessary churn can take a significant amount of time.

Therefore, demanding faster BI ‘time to value’ and expecting answers in sub-seconds could prove to be a costly mistake. While it is possible to gain some form of output in sub-seconds, these outputs will likely not be qualified, trusted insights that can deliver real strategic value to the enterprise.

By Mervyn Mooi, Director at Knowledge Integration Dynamics (KID)



SA companies are finally on the MDM and DQ bandwagon

Data integration and data quality management have become important factors for many South African businesses, says Johann van der Walt, MDM practice manager at Knowledge integration Dynamics (KID).

We have always maintained that solid data integration and data quality management are essential building blocks for master data management (MDM) and we’re finally seeing that customers believe this too. One of the primary drivers behind this is the desire for services oriented architecture (SOA) solutions for which MDM is a prerequisite to be effective. SOA relies on core data such as products, customers, suppliers, locations, and employees. Companies develop the capacity for lean manufacturing, supplier collaboration, e-commerce and business intelligence (BI) programmes. Master data also informs transactional systems and analytics systems so bad quality master data can significantly impact revenues and customer service as well as company strategies.

Taken in the context of a single piece of data MDM simply means ensuring one central record of a customer’s name, a product ID, or a street address, for example. But in the context of companies that employ in excess of 1 000 people, McKinsey found in 2013 that they have, on average, around 200 terabytes of data. Getting even small percentages of that data wrong can have wide ranging ramifications for operational and analytical systems, particularly as companies attempt to roll out customer loyalty programmes or new products, let alone develop new business strategies. It can also negatively impact business performance management and compliance reporting. In the operational context, transactional processing systems refer to the master data for order processing, for example.


(Image not owned by KID)

MDM is not metadata, which refers to technical details about the data. Nor is it data quality. However, MDM must have good quality data in order to function correctly. These are not new concerns. Both MDM and good quality data have existed for as long as there have been multiple data systems operating in companies. Today, though, they are exacerbated concerns because of the volume of data, the complexity of data, the most acute demand for compliance in the history of business, and the proliferation of business systems such as CRM, ERP and analytics. Add to that the fact that many companies use multiple instances of these systems across their various operating companies, divisions and business units, and can even extend to multiple geographies, across time zones with language variations. It unites to create a melting pot of potential error with far reaching consequences unless MDM is correctly implemented based on good quality data.

None of these concerns yet raise the issue of big data or the cloud. Without first ensuring MDM is properly and accurately implemented around the core systems companies don’t have a snowball’s hope in Hell of succeeding at any big data or cloud data initiatives. Big data adds successive layers of complexity depending on the scope of the data and variety of sources. Shuffling data into the cloud, too, introduces a complexity that the vast majority of businesses, especially those outside of the top 500, simply cannot cope with. With big data alone companies can expect to see an average growth of 60% of their stored data annually, according to IDC. That can be a frightening prospect for CIOs and their IT teams when they are still struggling to grapple with data feeding core systems.

While MDM is no longer a buzzword and data quality is an issue as old as data itself they are certainly crucial elements that South African companies are addressing today.