Expand data horizons for greater analytics value

Attempting to find purposeful insights in data could be a futile exercise unless you look beyond the siloes.

With the mainstreaming of advanced data analytics technologies, companies today can risk becoming too dependent on the outputs they receive from the analytics tools, which could serve biased results unless solid data analytics models are applied to the way in which the data is interrogated.

While data is your friend, and the only valid way for organisations to strategise based on fact, data analytics tools can only deliver the outputs they have been asked for. If the pool of data being analysed is too limited, or there is no end objective or purpose for using the results after the scientific methods have been applied to the data, then the whole exercise is virtually futile.

dataisyourfriend

It is seldom enough to drill down into a limited data repository and base broad strategic decisions on the findings. In effect, this would be like a novelty manufacturer assessing only the pre-festive season sales and concluding that Christmas trees are a perennial best-seller. Common sense tells us this will not be the case, and that Christmas trees won’t sell at all in January. But in the case of more complex products and services, trends and markets are not as easy to predict. This is where analytics comes in. Crucially, analytics must look beyond specific domain insights and seek a broader view for a more objective insight.

Comparisons and correlations

A factory may deploy analytics to determine which products to focus on to increase profits, for example. But where the questioning is too narrow, the results will not support strategic growth goals. The company must qualify and complement the questioning with comparatives. It is not enough to assess which products are the biggest sellers – the factory also needs to determine what products are manufactured at the lowest cost, and which deliver the highest return. By bringing together more components and correlating the data on the lowest cost products, highest return products and top sellers, the factory is positioned to make better strategic decisions.

In South Africa, many companies do not approach analytics in this way. They have a set of specific insights they want, and once they find them, they stop there. In this siloed approach, the results are not correlated against a broader pool of data for more objective outcomes. This may be due in part to factors such as the time and cost required for ongoing comparison and correlation, but it is also due to a lack of maturity in the market.

In mature organisations, data sciences are applied to all possible angles/queues and information resources to produce insights to monetise or franchise the data.  It is not just a case of finding unknown trends and insights – the discovery has to be purposeful as well.

business-dashboard-types Continue reading

Advertisements

Avoiding the data spaghetti junction

Mervyn Mooi.

 

Despite all their efforts and investments in quality centres of excellence, some enterprises are still grappling with issues, and this at a time when data is more important for than ever before.

The effects of poor data quality are felt throughout the enterprise, impacting everything from operations to customer experience, costing companies an estimated $3 trillion a year in the US alone.

Data quality will become increasingly crucial as organisations seek to build on their data to benefit from advances in analytics (including big data), artificial intelligence and machine learning.

We find organisations unleashing agile disruptors into their databases without proper controls in place; business divisions failing to standardise their controls and definitions; and companies battling to reconcile data too late in the lifecycle, often resulting in a ‘spaghetti junction’ of siloed, duplicated and non-standardised data that cannot deliver on its potential business value for the company.

Controls at source

Data quality as a whole has improved in recent years, particularly in banks and financial services facing the pressures of compliance.

However, this improvement is largely on the wrong side of the fence, after the data has been captured. This may stem from challenges experienced decades ago, when data validation of data being captured by thousands of clerks could slow down systems and result in customers having to wait in banks and stores while their details were captured.

Data quality will become increasingly crucial as organisations seek to build on their data to benefit from advances in analytics, artificial intelligence and machine learning.

But this practice has continued to this day in many organisations, which still qualify data after capture and so add unnecessary additional layers of resources for data cleaning.

Ensuring data quality should start with pre-emptive controls, with strict entry validation and verification rules, and data profiling of both structured and unstructured data.

Controls at the integration layer

Standardisation is crucial in supporting data quality, but in many organisations different rules and definitions are applied to the same data, resulting in duplication and an inability to gain a clear view of the business and its customers.

For example, the definition of the data entity called a customer may differ from one bank department to another: for the retail division, the customer is an individual, while for the commercial division, the customer is a registered business, and the directors of the business, also registered as customers. The bank will then have multiple versions of what a customer is, and when data is integrated, there will be multiple definitions and structures involved.

Commonality must be found in terms of definitions, and common structures and rules applied to reduce this complexity, and relationships in the data must be understood, with data profiling applied to assess the quality of the data.

Controls at the physical layer

Wherever a list of data exists, reference data should also be standardised across the organisation instead of using a myriad of conventions across various business units.

The next prerequisites for data quality are cleaning and data reconciliation. Incorrect, incomplete and corrupt records must be addressed, standardised conventions, definitions and rules applied, and a reconciliation must be done. What you put in must balance with what you take out. By using standardised reconciliation frameworks and processes, data quality and compliance are supported.

Controls at the presentation layer

On the front end where data is consumed, there should be a common data portal and standard access controls, or view into the data. While the consumption and application needs of each organisation vary, 99% of users do not need report authoring capabilities, and those who do should not have the ability to manipulate data out of context or in an unprotected way.

With a common data portal and standardised access controls, data quality can be better protected.

Several practices also support data quality: starting with a thorough needs analysis and defining data rules and standards in line with both business requirements and in compliance with legislation.

Architecture and design must be carefully planned, with an integration strategy adopted that takes into account existing designs and meta-data. Development initiatives must adhere to data standards and business rules, and the correctness of meta-data must be verified.

Effective testing must be employed to verify the accuracy of the test results and designs; and deployment must include monitoring, audit, reconciliation counts and other best practices.

With these controls and practices in place, the organisation achieves tight, well-governed and sustained data quality.

InfoFlow, Hortonworks to offer Hadoop skills courses

Veemal Kalanjee, MD of InfoFlow.
Veemal Kalanjee, MD of InfoFlow.

 

Local management InfoFlow has partnered with international software firm Hortonworks, to provide enterprise Hadoop training courses to the South African market.

Hadoop is an open source software framework for storing data and running applications on clusters of commodity hardware.

While the global Hadoop market is expected to soar, with revenue reaching $84.6 billion by 2021, the sector is witnessing a severe lack of trained and talented technical experts globally.

Hortonworks develops and supports open source Hadoop data platform software. The California-based company says its enterprise Hadoop is in high demand in SA, but to date, Hadoop skills have been scare and costly to acquire locally.

InfoFlow provides software, consulting and specialised services in business intelligence solutions, data warehousing and data integration.

The company will deliver localised expert resources and Hadoop training support programmes to a wide range of local companies across the financial services, retail, telecommunications and manufacturing sectors.

“There is huge demand in SA for enterprise Hadoop skills, with large enterprises having to fly expensive resources into the country to give their enterprise Hadoop projects guidance and structure,” says Veemal Kalanjee, MD of InfoFlow.

“Instead of moving existing skills around across various clients, Hortonworks wants to take a longer term approach by cross-skilling people through the training and leveraging the graduate programme run by InfoFlow.”

This partnership makes InfoFlow the only accredited Hortonworks training entity in Sub-Saharan Africa, adds Kalanjee.

The Hortonworks training will be added to InfoFlow’s broader portfolio of accredited Informatica Intelligent Data Platform graduate programmes across data management and data warehousing, governance, security, operations and data access.

The Hortonworks-InfoFlow partnership will bring to Johannesburg the only Hortonworks training and testing site in SA, according to the companies.

Local professionals will be able to attend classes focusing on a range of Hortonworks product training programmes at InfoFlow’s training centre in Fourways, Johannesburg.

The courses to be offered include: Hadoop 123; Essentials; Hadoop Admin Foundations; Hortonworks Data Platform and Developer Quick Start.

“There is currently no classroom-based training available on Hortonworks locally and if clients require this, the costs are too high. Having classroom-based training affords clients the ability to ask questions, interact on real-world challenges they are experiencing and apply the theory learnt in a lab environment, set up specifically for them.”

InfoFlow will have an accredited trainer early next year and will provide the instructor-led, classroom Hortonworks training at reduced rates, concludes Kalanjee.

Bots set to multi-task in SA’s insurance sector

Robotic process automation will make waves in the insurance market, offering cost savings, efficiencies and improved risk management.

 

Robotic process (RPA) is still relatively new to South Africa, with mainly the major moving to deploy it to manage certain repetitive and manual processes.

But RPA presents significant promise in many sectors where manual processes delay operations and add costs in a price-sensitive market.

The insurance industry is one sector that stands to achieve multiple gains from deploying RPA: through intelligent automation, they can achieve more streamlined processes, improved customer service, lower overheads and reduced risk.

RPA is akin to deploying an army of workers, or bots, to automate processes both in customer-facing and internal functions. From managing invoices and onboarding new customers, to validating data, assessing risk and confirming the market value of insured items, RPA tools can replace human resources; delivering outputs faster and more accurately.

It takes over very mundane manual tasks, like downloading an e-mail attachment and copying it to a directory, or capturing data to a standardised template. By automating rules-based steps, companies can eliminate data entry and capture errors, and reduce the number of resources needed to complete these processes.

RPA is akin to deploying an army of artificial intelligence workers, or bots, to automate processes both in customer-facing and internal functions.

In customer onboarding alone, where the process could cost hundreds of rands per customer, RPA supports both manual and self-service onboarding, and can then automatically check for blacklisting, confirm the market value of insured items and redirect the customer data to the correct service and finance departments.

Streamlining claims processes

The core value behind RPA is realised through automating a process which follows logical, rule-based steps, as with a claims process. Once claim information is captured, there are defined steps that need to be followed in order to assess whether a claim is valid and the communication necessary between the insurer and the claimant, based on the information collated. By introducing automation in this step, the communication is streamlined, accurate and timeous.

Part of any claims process is the phase of estimating what the loss is. Traditionally, this is a manual process of the claimant and estimator/assessor having numerous discussions to come to agreement on the value of the loss. With RPA, this can be streamlined by having the bot access vendor applications to assess the replacement value of the loss, which then forms the basis of the claim. This has both a benefit from the insurer’s side, where the process is shortened, and from the claimant’s side, where the estimation is objectively decided on.

Imagine a claims process where the insurer receives an e-mail from a claimant with an attached claim form, images of the loss as well as proof of purchases of these items. The e-mail is scanned, attachments extracted and sent to the appropriate systems for either capturing or further processing with human intervention.

This is exactly what RPA achieves. The benefit being that a claim can start being assessed almost immediately since all relevant information for processing is automatically captured in the correct systems, without human error or delay.

Not only is RPA efficient at extracting data off forms, it also provides the additional benefit of validating data on forms and in some instances, correcting it. This mitigates problems with the claims process (due to incorrect data) further downstream. It also helps mitigate the risk of fraud.

RPA has the ability to log all actions and reconcile stages within a process down to a low granularity. This is particularly important in the payment phase of claims processing to ensure the correct amount is paid to the claimant. RPA prevents incorrect payments before they happen, instead of waiting for audit findings to report on this.

In future, robots will also be used widely in the real-time review of social media streams to assess claims severity and reduce fraud. RPA will also receive and route advanced telematics data (including video imagery) that will be instantaneously captured during car accidents and downloaded from the cloud.

CX, integration benefits of RPA

One of the less acclaimed benefits of RPA (productivity and cost-saving being the most popular) is customer experience. Driving self-service within digital organisations is a priority, and allowing a claimant to register, manage their portfolio or submit a claim through an RPA-enabled app on their mobile device is one example of self-service. Not only does intelligent self-service improve the customer experience, it also drives down costs significantly.

Integration with other enabling technologies is one of the most important features of any RPA technology. Whether it is invoking a bot through an API, or being able to pass data gathered from a claim form to a downstream data-centric process, RPA technologies will have to integrate into existing systems and new AI-powered systems to prove the true value they can offer.

MD of Infoflow. 

Veemal Kalanjee is MD of Infoflow, part of the Knowledge Integration Dynamics (KID) group. He has an extensive background in data management sciences, having graduated from Potchefstroom University with an MSc in computer science. He subsequently worked at KID for seven years in various roles within the data management space. Kalanjee later moved to Informatica SA as a senior pre-sales consultant, and recently moved back to the KID Group as MD of Infoflow, which focuses on data management technologies, in particular, Informatica.

InfoFlow has partnered with Hortonworks to deliver scenario-based Hortonworks training.

 
Each course is taught by a Certified Hortonworks Instructor and includes a combination of instructor-led lectures, classroom discussions and comprehensive hands-on-lab exercises.
 
InfoFlow and Hortonworks University will be presenting the training in Johannesburg
 
Cost: R12 000 per person per day.
Register 3 people from the same company on the same course for R11 000 pppd
If the same person attends all 3 courses, the total cost is R 94 500 (R10 500 pppd)
HDP-123 Hortonworks Hadoop Essentials (1 Day) – 29 Oct or 5 Nov
—————————————————–
A technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer for those who plan to attend a hands-on, instructor-led course.
No previous Hadoop or programming knowledge is required.
 
DEV-201 Hortonworks Hadoop Developer Quick Start (4 days) – 30 Oct to 2 Nov
—————————————————–
For developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark. Topics include: Essential understanding of HDP and its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark features
 
Students should be familiar with programming principles and have experience in software development. SQL and light scripting knowledge is also helpful. No prior Hadoop knowledge is required.
ADM-221 Hortonworks Hadoop Admin Foundations (4 days) 6 Nov – 9 Nov
—————————————————–
For systems administrators who will be responsible for the design, installation, configuration, and management of the Hortonworks Data Platform (HDP). The course provides in-depth knowledge and experience in using Apache Ambari as the operational management platform for HDP. This course presumes no prior knowledge or experience with Hadoop.
 
Students must have experience working in in a Linux environment with standard Linux system commands and Shell scripts
No previous Hadoop or programming knowledge is required.
 
InfoFlow is a member of the Knowledge Integration Dynamics (KID) Group of Companies
 
Please contact yolanda.komen@kidgroup.co.za should you be interested in attending.
 

Consumer permission is not compliance

GDPR and POPI compliance demand restructuring of data management practices, and deep data and process mapping.

Mervyn Mooi.

Mervyn Mooi.

The of Europe’s General Protection Regulation (GDPR) has sparked a flurry of mails and notices from businesses and suppliers asking consumers to allow them to use their personal information for brand marketing and purposes.

Companies have added opt-in notices to their sites and briefed their teams on GDPR and POPI compliance. Unfortunately for them, these measures are far from adequate for what is required to comply with data protection and privacy regulation.

Superficial GDPR and POPI compliance (such as getting consumer permission to send them information and taking broad steps to improve information security) is not true data governance, and many organisations fail to realise this.

Having policies in place or protecting information inside a system is not enough. Even data protected within an organisation can be misused or leaked by employees, whether deliberately or through an action as apparently innocent as passing on a sales lead or a job applicant’s CV to a colleague.

Effective governance and data protection still rests heavily on the discipline of the people handling the information. Therefore, when anyone in the company can access unprotected data and information, any governance mechanisms in place will be at risk.

How stringent Europe’s enforcement of GDPR will be has yet to be seen, and although South African law is not yet fully equipped to handle individuals’ lawsuits against companies for failing to protect their personal information, it is only a matter of time before someone challenges an organisation around the protection of personal information. And this is where the onus will be on the company to prove what measures it took to protect the information.

Compliance-Guide-logo-orange_blue

Contingent measures for protecting data should be put in place should the discipline of people falter. One such measure (which is pinnacle for enabling/proving governance) is the mapping of the rules, conditions, checks, standards (RCCSs) as transcribed from the regulations or accords (including GDPR covering data privacy through to POPI, King III, BCBS239, KYC and PCI) to the respective accountable and responsible people, to the data domains and to the control points of processes that handle the data/information within an organisation. These mappings need to be captured and maintained within a registry.

Effective governance and data protection still rests heavily on the discipline of the people handling the information.

Building an effective and future-proof RCCS registry can be a lengthy process. But the creation and maintenance of this registry is easily achieved within practice of metadata management, which already shows the mappings, which then simply need to be linked to policies, procedures and guidelines from the accords and regulations.

A registry typically evolves over time, mapping RCCSs to people, processes and data; ultimately proving that all rules, policies and procedures are physically implemented across all processes where the data is handled.

Once the mapping registry is in place, it becomes easier to identify and prevent data breaching or information leakage. More importantly, it also allows the organisation to ensure its data management rules and handling thereof are fully aligned with legislation across the organisation.

An effective digital RCCS mapping registry allows the auditor and responsible parties to easily link processes and data to legislation and policies, or to drill down to individual data fields to track compliance throughout its lifecycle/lineage.

But regardless if an organisation has all measures and controls to ensure GDPR RCCSs are implemented, governance (including that for the protection of data/information) still needs to be proved in terms of presentation or reporting.

In other words, a full data and process tracking (or lineage) and reporting capability needs to be in place, managed by a data governance organisational structure of people and regulated by a data governance framework which includes an engagement model that would be necessary between all responsible, accountable, consulted and informed parties.

For many, this could mean rebuilding their data management operating and system models from the ground up. Organisations should be taking steps now to put in place metadata management as the foundation for enabling compliance.

To build their ability to prove governance, organisations must prioritise this “governance” mapping exercise. Few companies have achieved this ‘sweet spot’ of data governance.

As the legislative environment changes and individuals begin challenging misuse of personal information, companies will increasingly be called on to show deep mapping and deep governance. Few, if any, do this today, but the implementation of GDPR serves as a useful reminder that this process should start now.

Benefits of GDPR compliance

The General Data Protection Regulation deadline could be a useful catalyst for getting data life cycle management back under control.

 

 The General Protection Regulation (GDPR), set for enforcement from 25 May, will impact all local companies trading with clients in Europe. But, like the local Protection of Personal Information Act (POPIA), the GDPR also presents a good set of best practice guidelines for any company to follow. For many companies that have left their data management and life cycle management standards to slide, the GDPR could provide the incentive they need to get back on track.

While most enterprises take data governance seriously and have, in the past, set data management frameworks, the realities of business today introduce the risk of standards not being applied. Up to five years ago, companies were moving towards single view, uniform standards approaches to data management. Then the pace of business increased, new paradigms emerged, and outcomes-based and agile approaches to development made their impact felt. Business units began overtaking the and enabling teams, driving the adoption of new systems and processes and allowing these new additions to dictate the data management standards. Agile deliveries, seeking ‘quick and dirty’ wins, often neglected to capitalise or use existing enterprise data management frameworks and standards.

In a rapidly evolving environment, companies may find multiple processes and business units working in pockets for the same data domain or similar purposes, tending to define their own standards for management of data. This results in overlapping and duplicated processes and resources. If, for example, there are 20 processes scattered across 20 systems dealing in isolation with data acquisition, data profiling and data quality, should there be data anomalies, the company would have to look for and fix it in 20 places (that is after confirming which of those processes are correct or incorrect). Not only is this inefficient, it causes inconsistencies and mistrust.

Play by the rules

In contrast, effective management of the data from creation to destruction, within the parameters of the data governance framework, would ensure data has the same rules across the enterprise, regardless of the business units or processes utilising it. This effective management is in line with GDPR compliance, which also demands clear audit trails on personal data collection, storage, recall, processing and destruction.

The realities of business today introduce the risk of standards not being applied.

The five steps that make up the model for GDPR can be implemented or articulated with an effective data management life cycle (DMLC) process. The DMLC is the key to articulating (or mapping) the data rules, conditions and standards in context of data governance practice.

Properly governed, each data set/theme/entity can have a DMLC process, with each stage on the DMLC having specific rules with regards to the management or handling of the data. This can then be applied consistently, from one place, across all systems and business processes where it may be created, used and destroyed.

A key enabler for this is metadata management. As companies see the importance of more effective governance and data management, metadata management as a priority is coming through strongly, since it shows the lineage of the resources to support change management and new projects. It also allows for economising and re-use of information or data resources and artefacts.

With regulations such as POPIA and GDPR, the focus is on data quality, governance and security, and lineage is key to all of these. This lineage relates not just to tracking the change history of the data content, but also to the definitions and structure of the data. Companies need both effective metadata management and the DMLC process.

Seeing that business is taking the lead in many instances, it is also well placed to lead in terms of standards for DMLC and data governance rules, while technical teams (including architecture and IT) are tasked with implementing them.

Defining data standards and rules is not only a technical task; it also requires an understanding of the data and a grasp of business requirements and legislation. Serving as translators between business and technical, the governance teams, whose role has evolved strongly over recent years, must articulate GDPR requirements and map them to the data domains, systems and processes, ensuring the company is able to prove governance and compliance, not just on paper, but in practice.

Visit www.kid.co.za