Sub-second analytical BI time to value still a pipe dream

Internet search engines with instant query responses may have misled enterprises into believing all analytical queries should deliver split second answers.

With the advent of Big Data analytics hype and the rapid convenience of internet searches, enterprises might be forgiven for expecting to have all answers to all questions at their fingertips in near real time.

pexels-photo-256307.jpeg

Unfortunately, getting trusted answers to complex questions is a lot more complicated and time consuming than simply typing a search query. Behind the scenes on any internet search, a great deal of preparation has already been done in order to serve up the appropriate answers. Google, for instance, dedicates vast amounts of high-end resources and all of its time to preparing the data necessary to answer a search query instantly. But even Google cannot answer broad questions or make forward-looking predictions. In cases where the data is known and trusted, the data has been prepared and rules have been applied, and the search parameters are limited, such as with a property website, almost instant answers are possible, but this is not true BI or analytics.

Within the enterprise, matters become a lot more complicated.  When the end-user seeks an answer to a broad query – such as when a marketing firm wants to assess social media to find an affinity for a certain range of products over a 6-month period – a great deal of ‘churn’ must take place in the background to deliver answers. This is not a split-second process, and it may deliver only general trends insights rather than trusted, quality data that can serve as the basis for strategic decisions.

When the end user wishes to do a query and is given the power to process their own BI/Analytics, lengthy churn must take place. Every time a query, report or instance of data access is converted into useful BI/Analytical information for end-consumers, there is a whole lot of preparation work to be done along the way : i.e. identify data sources>  access> verify> filter> pre-process>  standardize> lookup> match> merge> de-dup> integrate> apply rules> transform> preprocess> format> present> distribute/channel.

Because most queries have to traverse, link and process millions of rows of data and possibly trillions of words from within the data sources, this background churn could take hours, days or even longer.

A recent TWDI study found that organisations are dissatisfied with the time it takes for the chain of processes involved for BI, analytics and data warehousing to deliver valuable data and insights to business users. The organisations attributed this, in part, to ill-defined project objectives and scope, a lack of skilled personnel, data quality problems, slow development or inability to access all relevant data.

The problem is that most business users are not BI experts and do not all have analytical minds, so the discover and report method may be iterative (therefore slow) and in many cases the outputs/results are not of the quality expected. The results may also be inaccurate as data quality rules may not have been applied, and data linking may not be correct, as it would be in a typical data warehouse where data has been qualified and pre-defined/derived. In a traditional situation, with a structured data warehouse where all the preparation is done in one place, and once only, and then shared many times, supported by quality data and predefined rules, it may be possible to get sub-second answers. But often even in this scenario, sub-second insights are not achieved, since time to insight also depends on properly designed data warehouses, server power and network bandwidth.

Users tend to confuse search and discover on flat raw data that’s already there, with information and insight generation at the next level. In more complex BI/Analytics, each time a query is run, all the preparation work has to be done from the beginning and the necessary churn can take a significant amount of time.

Therefore, demanding faster BI ‘time to value’ and expecting answers in sub-seconds could prove to be a costly mistake. While it is possible to gain some form of output in sub-seconds, these outputs will likely not be qualified, trusted insights that can deliver real strategic value to the enterprise.

By Mervyn Mooi, Director at Knowledge Integration Dynamics (KID)

 

Advertisements

Big data follows the BI evolution curve

Big Data analysis in South Africa is early in its maturity levels, and has yet to evolve in much the same way as BI did 20 years ago, says Knowledge Integration Dynamics.

By Mervyn Mooi, director at Knowledge Integration Dynamics (KID)

Big data analysis tools aren’t ‘magical insight machines’ spitting out answers to all business’s questions: as is the case with all business intelligence tools, there are lengthy and complex processes that must take place behind the scenes before actionable and relevant insights can be drawn from the vast and growing pool of structured and unstructured data in the world.

Depositphotos_45199151_l-2015

South African companies of all sizes have an appetite for big data analysis, but because the country’s big data analysis segment is relatively immature, they are still focused on their big data strategies and the complexity of actually getting the relevant data out of this massive pool of information. We find many enterprises currently looking at technologies and tools like Hadoop to help them collate and manage big data. There are still misconceptions around the tools and methodologies for effective big data analysis: companies are sometimes surprised to discover they are expecting too much, and that a great deal of ‘pre-work’, strategic planning and resourcing is necessary.

Much like the early days of BI, big data analysis started as a relatively unstructured, ad hoc discovery process, but once patterns are established, models are developed, and the process becomes a structured one.

And in the same way that BI tools depend on data quality and relationship linking, big data analysis depends on some form of qualifying prior to being used. The data needs to be profiled for flaws which need to be cleansed (quality), it must be put into relevancy (relationships) and it must be timeous in context of what is being searched or reported on.  Methods must be devised to qualify much of the unstructured data, as a big question remains around how trusted and accurate information from the internet will be.

The reporting and application model that uses this structured and unstructured data must be addressed, and the models must be tried and tested. In the world of sentiment analysis and trends forecasting based on ever-changing unstructured data, automated models are not always the answer. Effective big data analysis also demands human intervention from highly skilled data scientists who have both business and technical experience.  These skills are still scare in South Africa, but we are finding a growing number of large enterprises retaining small teams of skilled data scientists to develop models and analyse reports.

As local big data analysis matures, we will find enterprises looking to strategise on their approaches, the questions they want to answer, what software and hardware to leverage and how to integrate new toolsets with their existing infrastructure. Some will even opt to leverage their existing BI toolsets to address their big data analysis needs.  BI and big data are already converging, and we can expect to see more of this taking place in years to come.

Governance: still the biggest hurdle in the race to effective BI

By Mervyn Mooi, director at Knowledge Integration Dynamics (KID)

Whether you’re talking traditional big stack BI solutions or new visual analytics tools, it’s an unfortunate fact that enterprises still buy in to the candy-coated vision of BI without fully addressing the underlying factors that makes BI successful, cost-effective and sustainable.

Information management is a double-edged sword. Well architected, governed and sustainable BI will deliver the kind of data business needs to make strategic decisions. But BI projects built on ungoverned, unqualified data / information and undermined by ‘rebel’ or shadow BI will deliver skewed and inaccurate information: and any enterprise basing its decisions on bad information is making a costly mistake. Too many organisations have been doing the latter, resulting in failed BI implementations and investment losses.

For more than a decade, we at Knowledge Integration Dynamics have been urging enterprises to formalise and architect their enterprise information management (EIM) competencies based on best-practice or industry standards, which follow an architected approach and are subjected to governance.

 

EIM is a complex environment that needs to be governed and which encompasses data warehousing, business intelligence (BI), traditional data management, enterprise information architecture (EIA), data integration (DI), data quality management (DQM), master data management (MDM), data management life cycle (DMLC), information life cycle management (ILM), records and content management (ECM), metadata management and security / privacy management.

Effective governance is an ongoing challenge, particularly in an environment in which business must move at an increasingly rapid pace where information changes all the time.

For example, to tackle the governance issue in context of data quality starts with the matching and merging of historic data to ensure design and storage conventions are aligned and all data is accurate but according to set rules and standards. It is not just a matter of plugging in a BI solution that would give you results: it may require up to a year of careful design and architecture to integrate data from various departments and sources in order to feed the BI system. The conventions across departments within a single organization are often dissimilar, and all data has to be integrated and qualified. Even data as apparently straightforward as a customer’s ID number may be incorrect – with digits transposed, coded differently between source systems or missing – so the organisation must decide which data source or integration rule to trust in order to ensure data warehouses are compliant with quality rules and also with legislation standards needed to build the foundation of the 360-degree view of the customer that executive management aspires to. But integrating the data and addressing data quality is only one area where effective governance must be applied.

Many organisations wrongly assume that in data, nothing changes. But in reality, the organisation must cater for constant change. For example, when reporting in a bank, customer records could be dramatically incorrect if the data fails to reflect that certain customers have moved to new cities, or that bank branch hierarchies have changed. Therefore, linking and change tracking is crucial in ensuring data integrity and accurate current and historic reporting.

And automation takes you only so far: you can automate to the nth degree, but you still require data stewards to carry out certain manual verifications to ensure that the data is correct and remains so. Organisations also need to know who is responsible and accountable for their data and be able to monitor and control the lifecycle process from one end to the other. The goals are to eliminate multiple versions of the truth (results), have a trail back to sources and ensure that only the trusted version of the truth is integrated into systems.

Another challenge in the way of effective information management is the existence of ‘rebel’ or shadow data systems. In most organisations, departments frustrated by slow delivery from IT or with unique data requirements, start working in siloes, creating their own spreadsheets, duplicating data and processes, and not inputting all the data back into the central architecture. This undermines effective data governance and results in huge overall costs for the company. Instead, all users should follow the correct processes and table their requirements, and the BI system should be architected to cater for these new requirements. It all needs to come through the central architecture: In this way, the entire ecosystem can be governed effectively and data /information could be delivered from one place, also making management thereof easier and more cost-effective.

The right information management processes also have to be put in place, and they must be sustainable. This is where many BI projects fail – an organization builds a solution and it lasts only a year, because no supporting frameworks were put in place to make it sustainable. Organisations need to take a standards-based, architected approach to ensure EIM and governance is sustained and perpetuated.

New BI solutions and best practice models emerge continually, but will not solve the business and operational problems if they are implemented in an ungoverned environment, much the way a beautiful luxury car may have all the features you need, but unless the driver is disciplined, it will not perform as it should.

 

Knowledge Integration Dynamics, Mervyn Mooi, (011) 462-1277, mervyn.mooi@kidgroup.co.za