Consumer permission is not compliance

GDPR and POPI compliance demand restructuring of data management practices, and deep data and process mapping.

Mervyn Mooi.

Mervyn Mooi.

The of Europe’s General Protection Regulation (GDPR) has sparked a flurry of mails and notices from businesses and suppliers asking consumers to allow them to use their personal information for brand marketing and purposes.

Companies have added opt-in notices to their sites and briefed their teams on GDPR and POPI compliance. Unfortunately for them, these measures are far from adequate for what is required to comply with data protection and privacy regulation.

Superficial GDPR and POPI compliance (such as getting consumer permission to send them information and taking broad steps to improve information security) is not true data governance, and many organisations fail to realise this.

Having policies in place or protecting information inside a system is not enough. Even data protected within an organisation can be misused or leaked by employees, whether deliberately or through an action as apparently innocent as passing on a sales lead or a job applicant’s CV to a colleague.

Effective governance and data protection still rests heavily on the discipline of the people handling the information. Therefore, when anyone in the company can access unprotected data and information, any governance mechanisms in place will be at risk.

How stringent Europe’s enforcement of GDPR will be has yet to be seen, and although South African law is not yet fully equipped to handle individuals’ lawsuits against companies for failing to protect their personal information, it is only a matter of time before someone challenges an organisation around the protection of personal information. And this is where the onus will be on the company to prove what measures it took to protect the information.

Compliance-Guide-logo-orange_blue

Contingent measures for protecting data should be put in place should the discipline of people falter. One such measure (which is pinnacle for enabling/proving governance) is the mapping of the rules, conditions, checks, standards (RCCSs) as transcribed from the regulations or accords (including GDPR covering data privacy through to POPI, King III, BCBS239, KYC and PCI) to the respective accountable and responsible people, to the data domains and to the control points of processes that handle the data/information within an organisation. These mappings need to be captured and maintained within a registry.

Effective governance and data protection still rests heavily on the discipline of the people handling the information.

Building an effective and future-proof RCCS registry can be a lengthy process. But the creation and maintenance of this registry is easily achieved within practice of metadata management, which already shows the mappings, which then simply need to be linked to policies, procedures and guidelines from the accords and regulations.

A registry typically evolves over time, mapping RCCSs to people, processes and data; ultimately proving that all rules, policies and procedures are physically implemented across all processes where the data is handled.

Once the mapping registry is in place, it becomes easier to identify and prevent data breaching or information leakage. More importantly, it also allows the organisation to ensure its data management rules and handling thereof are fully aligned with legislation across the organisation.

An effective digital RCCS mapping registry allows the auditor and responsible parties to easily link processes and data to legislation and policies, or to drill down to individual data fields to track compliance throughout its lifecycle/lineage.

But regardless if an organisation has all measures and controls to ensure GDPR RCCSs are implemented, governance (including that for the protection of data/information) still needs to be proved in terms of presentation or reporting.

In other words, a full data and process tracking (or lineage) and reporting capability needs to be in place, managed by a data governance organisational structure of people and regulated by a data governance framework which includes an engagement model that would be necessary between all responsible, accountable, consulted and informed parties.

For many, this could mean rebuilding their data management operating and system models from the ground up. Organisations should be taking steps now to put in place metadata management as the foundation for enabling compliance.

To build their ability to prove governance, organisations must prioritise this “governance” mapping exercise. Few companies have achieved this ‘sweet spot’ of data governance.

As the legislative environment changes and individuals begin challenging misuse of personal information, companies will increasingly be called on to show deep mapping and deep governance. Few, if any, do this today, but the implementation of GDPR serves as a useful reminder that this process should start now.

Advertisements

Benefits of GDPR compliance

The General Data Protection Regulation deadline could be a useful catalyst for getting data life cycle management back under control.

 

 The General Protection Regulation (GDPR), set for enforcement from 25 May, will impact all local companies trading with clients in Europe. But, like the local Protection of Personal Information Act (POPIA), the GDPR also presents a good set of best practice guidelines for any company to follow. For many companies that have left their data management and life cycle management standards to slide, the GDPR could provide the incentive they need to get back on track.

While most enterprises take data governance seriously and have, in the past, set data management frameworks, the realities of business today introduce the risk of standards not being applied. Up to five years ago, companies were moving towards single view, uniform standards approaches to data management. Then the pace of business increased, new paradigms emerged, and outcomes-based and agile approaches to development made their impact felt. Business units began overtaking the and enabling teams, driving the adoption of new systems and processes and allowing these new additions to dictate the data management standards. Agile deliveries, seeking ‘quick and dirty’ wins, often neglected to capitalise or use existing enterprise data management frameworks and standards.

In a rapidly evolving environment, companies may find multiple processes and business units working in pockets for the same data domain or similar purposes, tending to define their own standards for management of data. This results in overlapping and duplicated processes and resources. If, for example, there are 20 processes scattered across 20 systems dealing in isolation with data acquisition, data profiling and data quality, should there be data anomalies, the company would have to look for and fix it in 20 places (that is after confirming which of those processes are correct or incorrect). Not only is this inefficient, it causes inconsistencies and mistrust.

Play by the rules

In contrast, effective management of the data from creation to destruction, within the parameters of the data governance framework, would ensure data has the same rules across the enterprise, regardless of the business units or processes utilising it. This effective management is in line with GDPR compliance, which also demands clear audit trails on personal data collection, storage, recall, processing and destruction.

The realities of business today introduce the risk of standards not being applied.

The five steps that make up the model for GDPR can be implemented or articulated with an effective data management life cycle (DMLC) process. The DMLC is the key to articulating (or mapping) the data rules, conditions and standards in context of data governance practice.

Properly governed, each data set/theme/entity can have a DMLC process, with each stage on the DMLC having specific rules with regards to the management or handling of the data. This can then be applied consistently, from one place, across all systems and business processes where it may be created, used and destroyed.

A key enabler for this is metadata management. As companies see the importance of more effective governance and data management, metadata management as a priority is coming through strongly, since it shows the lineage of the resources to support change management and new projects. It also allows for economising and re-use of information or data resources and artefacts.

With regulations such as POPIA and GDPR, the focus is on data quality, governance and security, and lineage is key to all of these. This lineage relates not just to tracking the change history of the data content, but also to the definitions and structure of the data. Companies need both effective metadata management and the DMLC process.

Seeing that business is taking the lead in many instances, it is also well placed to lead in terms of standards for DMLC and data governance rules, while technical teams (including architecture and IT) are tasked with implementing them.

Defining data standards and rules is not only a technical task; it also requires an understanding of the data and a grasp of business requirements and legislation. Serving as translators between business and technical, the governance teams, whose role has evolved strongly over recent years, must articulate GDPR requirements and map them to the data domains, systems and processes, ensuring the company is able to prove governance and compliance, not just on paper, but in practice.

Visit www.kid.co.za

 

Blockchain in the compliance arsenal

By Mervyn Mooi

Blockchain technology may support some data management efforts, but it’s not a silver bullet for compliance.

Amid growing global interest in the potential for technologies to support management, enterprises may be questioning its role in compliance, particularly as the deadline looms for compliance with the European Union General Data Protection Regulation (GDPR).

complianceFor South African enterprises, compliance with the Protection of Personal Information (POPI) Act and alignment with the GDPR are a growing concern. Because GDPR and POPI are designed to foster best practice in data governance, it is in the best interests of any company to follow their guidelines for data quality, access , life cycle management and process management – no matter where in the world they are based.

At the same time, blockchain is attracting worldwide interest from a storage efficiency and optimisation point of view, and many companies are starting to wonder whether it can effectively support data management, security and compliance. One school of thought holds that moving beyond crypto-currency, blockchain’s decentralised data management systems and ledgers present new opportunities for more secure, more efficient data storage and processing.

However, there are still questions around how blockchain will align with best practice in data management and whether it will effectively enhance data security.

Once data is stored in blockchains, it cannot be changed or deleted.

Currently, blockchain technology for storing data may be beneficial for historic accounting and tracking/lineage purposes (as it is immutable), but there are numerous factors that limit blockchain’s ability to support GDPR/POPI and other compliance requirements.

Immutability pros and cons

Because public blockchains are immutable, once data is stored in blockchains, it cannot be changed or deleted. This supports auditing by keeping a clear record of the original, and every instance of change made to the data. While blockchain stores the lineage of data in an economical way, it will not address data quality and integration issues, however.

It should also be noted that this same immutability could raise compliance issues around the GDPR’s right to be forgotten guidelines. These dictate the circumstances under which records should be deleted or purged.

In a public blockchain environment, this is not feasible. Indeed, in many cases, it would not be realistic or constructive to destroy all records, and this is an area where local enterprises would need to carefully consider how closely they want to align with GDPR, and whether encryption to put data beyond use would suffice to meet GDPR’s right to be forgotten guidelines.

Publicly stored data concerns

In addition to the right to be forgotten issue, there is the challenge that data protection, privacy and accessibility are always at risk if data is stored in a public domain, such as the cloud or a blockchain environment. Therefore, enterprises considering the storage optimisation benefits of blockchain would also have to consider whether the core and confidential data is locally stored on private chains, and more importantly, whether those chains are subjected to security and access rules and whether the chain registries in the blockchain distributed environment are protected and subject to availability rules.

Blockchain environments also potentially present certain processing limitations: enterprises will have to consider whether blockchain will allow for parts of the chain stored for a particular business entity, such as a customer (or its versions), to be accessed and processed separately by different parties (data subjects) and/or processes.

Data quality question

The pros and cons of blockchain’s ability to support storage, management and security of data in the environment is just one side of the compliance coin: data quality is also a requirement of best practice data management. This is not a function of blockchain and therefore cannot be guaranteed by blockchain. Indeed, blockchain will store even unqualified data prior to its being cleansed and validated.

Enterprises will need to be aware of this, and consider how and where such data will be maintained. The issues of data integration and impact analysis also lie outside the blockchain domain.

IDC notes: “While the functions of the blockchain may be able to act independently of legacy systems, at some point blockchains will need to be integrated with systems of record,” and says there are therefore opportunities for “blockchain research and development projects, [to] help set standards, and develop solutions for management, integration, interoperability, and analysis of data in blockchain networks and applications”.

While blockchain is set to continue making waves as ‘the next big tech thing’, it remains to be seen whether this developing technology will have a significant role to play in compliance and overall data management in future.

Sub-second analytical BI time to value still a pipe dream

Internet search engines with instant query responses may have misled enterprises into believing all analytical queries should deliver split second answers.

With the advent of Big Data analytics hype and the rapid convenience of internet searches, enterprises might be forgiven for expecting to have all answers to all questions at their fingertips in near real time.

pexels-photo-256307.jpeg

Unfortunately, getting trusted answers to complex questions is a lot more complicated and time consuming than simply typing a search query. Behind the scenes on any internet search, a great deal of preparation has already been done in order to serve up the appropriate answers. Google, for instance, dedicates vast amounts of high-end resources and all of its time to preparing the data necessary to answer a search query instantly. But even Google cannot answer broad questions or make forward-looking predictions. In cases where the data is known and trusted, the data has been prepared and rules have been applied, and the search parameters are limited, such as with a property website, almost instant answers are possible, but this is not true BI or analytics.

Within the enterprise, matters become a lot more complicated.  When the end-user seeks an answer to a broad query – such as when a marketing firm wants to assess social media to find an affinity for a certain range of products over a 6-month period – a great deal of ‘churn’ must take place in the background to deliver answers. This is not a split-second process, and it may deliver only general trends insights rather than trusted, quality data that can serve as the basis for strategic decisions.

When the end user wishes to do a query and is given the power to process their own BI/Analytics, lengthy churn must take place. Every time a query, report or instance of data access is converted into useful BI/Analytical information for end-consumers, there is a whole lot of preparation work to be done along the way : i.e. identify data sources>  access> verify> filter> pre-process>  standardize> lookup> match> merge> de-dup> integrate> apply rules> transform> preprocess> format> present> distribute/channel.

Because most queries have to traverse, link and process millions of rows of data and possibly trillions of words from within the data sources, this background churn could take hours, days or even longer.

A recent TWDI study found that organisations are dissatisfied with the time it takes for the chain of processes involved for BI, analytics and data warehousing to deliver valuable data and insights to business users. The organisations attributed this, in part, to ill-defined project objectives and scope, a lack of skilled personnel, data quality problems, slow development or inability to access all relevant data.

The problem is that most business users are not BI experts and do not all have analytical minds, so the discover and report method may be iterative (therefore slow) and in many cases the outputs/results are not of the quality expected. The results may also be inaccurate as data quality rules may not have been applied, and data linking may not be correct, as it would be in a typical data warehouse where data has been qualified and pre-defined/derived. In a traditional situation, with a structured data warehouse where all the preparation is done in one place, and once only, and then shared many times, supported by quality data and predefined rules, it may be possible to get sub-second answers. But often even in this scenario, sub-second insights are not achieved, since time to insight also depends on properly designed data warehouses, server power and network bandwidth.

Users tend to confuse search and discover on flat raw data that’s already there, with information and insight generation at the next level. In more complex BI/Analytics, each time a query is run, all the preparation work has to be done from the beginning and the necessary churn can take a significant amount of time.

Therefore, demanding faster BI ‘time to value’ and expecting answers in sub-seconds could prove to be a costly mistake. While it is possible to gain some form of output in sub-seconds, these outputs will likely not be qualified, trusted insights that can deliver real strategic value to the enterprise.

By Mervyn Mooi, Director at Knowledge Integration Dynamics (KID)

 

Answers in no time

Internet search engines with instant query responses may have misled enterprises into believing all analytical queries should deliver split-second answers.

With the advent of big data analytics hype and the rapid convenience of Internet searches, enterprises might be forgiven for expecting to have all answers to all questions at their fingertips in near real-time.

dataflow

 

Unfortunately, getting trusted answers to complex questions is a lot more complicated and time-consuming than simply typing a search query. Behind the scenes on any Internet search, a great deal of preparation has already been done in order to serve up the appropriate answers.

Google, for instance, dedicates vast amounts of high-end resources and all of its time to preparing the data necessary to answer a search query instantly. But, even Google cannot answer broad questions or make forward-looking predictions.

In cases where the data is known and trusted, the data has been prepared and rules have been applied, and the search parameters are limited, such as with a property Web site, almost instant answers are possible, but this is not true business intelligence (BI) or analytics.

Behind the scenes

Within the enterprise, matters become a lot more complicated. When the end-user seeks an answer to a broad query – such as when a marketing firm wants to assess social media to find an affinity for a certain range of products over a six-month period – a great deal of ‘churn’ must take place in the background to deliver answers. This is not a split-second process, and it may deliver only general trend insights rather than trusted, quality data that can serve as the basis for strategic decisions.

Most business users are not BI experts.

 

When end-users wish to do a query and are given the power to process their own BI/analytics, lengthy churn mke place. Every time a query, report or instance of data access is converted into useful BI/analytical information for end-consumers, there is a whole lot of preparation work to be done along the way: ie, identify data sources> access> verify> filter> pre-process> standardise> look up> match> merge> de-dup> integrate> apply rules> transform> pre-process> format> present> distribute/channel.

Because most queries have to traverse, link and process millions of rows of data and possibly trillions of words from within the data sources, this background churn could take hours, days or even longer.

A recent TWDI study found organisations are dissatisfied with the time it takes for the chain of processes involved for BI, analytics and data warehousing to deliver valuable data and insights to business users. The organisations attributed this, in part, to ill-defined project objectives and scope, a lack of skilled personnel, data quality problems, slow development or inability to access all relevant data.

The problem is most business users are not BI experts and do not all have analytical minds, so the ‘discover and report’ method may be iterative (therefore slow), and in many cases, the outputs/results are not of the quality expected. The results may also be inaccurate as data quality rules may not have been applied, and data linking may not be correct, as it would be in a typical data warehouse where data has been qualified and pre-defined/derived.

In a traditional situation, with a structured data warehouse where all the preparation is done in one place, and once only, and then shared many times, supported by quality data and predefined rules, it may be possible to get sub-second answers.

But, often, even in this scenario, sub-second insights are not achieved, since time to insight also depends on properly designed data warehouses, server power and network bandwidth.

Users tend to confuse search and discover on flat raw data that’s already there, with information and insight generation at the next level. In more complex BI/analytics, each time a query is run, all the preparation work has to be done from the beginning and the necessary churn can take a significant amount of time.

Therefore, demanding faster BI ‘time to value’ and expecting answers in sub-seconds could prove to be a costly mistake. While it is possible to gain some form of output in sub-seconds, these outputs will likely not be qualified, trusted insights that can deliver real strategic value to the enterprise.

Old business issues drive a spate of data modernisation programmes

 

By Mervyn Mooi, director at Knowledge Integration Dynamics (KID)

The continued evolution of all things is obviously also felt in the data warehousing and business intelligence fields and it is apparent that many organisations are currently on a modernisation track.

But why now? Behind it all is exponential growth and accumulation of data and businesses are actively seeking to derive value, in the form of information and insights, from the data. They need this for marketing, sales and performance measurement purposes and to help them face other business challenges. All the business key performance indicators or buzzwords are there: wallet share, market growth, churn, return on investment (ROI), margin, survival, customer segments, competition, productivity, speed, agility, efficiency and more. Those are business factors and issues that require stringent management for organisational success.

Take a look at Amazon’s recommended lists and you’ll see how evident and crucial these indicators are. Or peek into a local bank’s, retailer’s or other financial institution’s rewards programmes.

AAEAAQAAAAAAAA1bAAAAJGE4ZTEwYWNjLTE4YjctNGU2YS1hOWFhLTA2NWU0OTYxOWNlNA

(Image not owned by KID)

Social media has captured the media limelight in terms of new data being gathered, explored and exploited. But it’s not the only one. Mobility, cloud and other forms of big data, such as embedded devices and The Internet of Things, collectively offer a smorgasbord of potential that many companies are mining for gold while others are entrenching such value in their IT and marketing sleuths if they’re to remain in the game. Monetisation of the right data, information and functionality, at the right time, is paramount.

The tech vendors have been hard at work to crack the market and give their customers what they need to get the job done. One of the first things they did was come out with new concepts of working with data under using the old technologies. They introduced tactical strategies like centre of excellence, enterprise resource planning, application and information integration, sand-pitting and more. They also realised the need to bring the techies out of the IT cold room and put them in front of business-people so that they could get the reports the business needed to be competitive, agile, efficient and all the other buzzwords. That had limited success.

In the meantime the vendors were also developing modern and state-of-the-art technologies that people can use. The old process of having techies write reports that would be fed to business-people on a monthly basis was not efficient, not agile, not competitive and generally not at all what they needed. What they needed were tools that could hook into any source or system, that could be accessed and massaged by the business-people themselves and that could be relied upon for off-the-shelf integration and reporting. Besides that, big data was proving to be complex and required a new and useable strategy that would be scalable and affordable to both the organisation and the man on the street.

Hadoop promised to help that along. Hadoop is a framework based on open source technology that can give other benefits such as better return on investment by using clusters of low cost servers. And it can chew through petabytes of information quickly. The key is integrating Hadoop into mainstream analytics applications.

Columnar databases make clever use of the properties of the underlying storage technologies that enable compression economies and make searching through the data quicker and more efficient. There’s a lot of techie mumbo jumbo that makes it work but suffice to say that searching information puts the highest overhead on systems and networks so it’s a natural area to address first.

NoSQL is also known as Not only SQL because it provides storage and retrieval modelled not only on tables, common to relational databases, but also by column, document, key values, graphs, lists, URLs and more. Its designs are simpler, horizontal scaling is better – which improves the ability to add low cost systems to improve performance – and it offers better control over availability.

Data appliances are just as the name suggests: plug and play, data warehousing in a box, systems, software and the whole caboodle. Just pop it in and: “Presto,” you’ve got a ton more capacity and capability. These technologies employ larger, massively parallel, and faster in-memory processing techniques.

Those technologies, and there are others like them, solve the original business issues mentioned upfront. They deliver the speed of analytics that companies need today, they give companies the means to gather, store and view the data differently that can lead to new insights, they can grow or scale as the company’s data demands change, their techies and business-people alike are more productive using the new tools, and they bring a whole raft of potential ROI benefits. ROI, let’s face it, is becoming a bigger issue in environments where demands are always growing, never diminishing, and where financial directors are increasingly furrow browed with an accumulation of nervous tics.

Large businesses aren’t about to rip out their existing investments – there’s the implicit ROI again – but will rather evolve what they have. The way organisations are working to change reporting and analytics, though, will have an impact on the skills that they require to sustain their environments. Technical and business tasks are being merged and that’s why there’s growing demand for so-called data scientists.

Data scientists are supposed to be the do-it-all guys, right from data sourcing and discovery to franchising insightful and sentiment-based intelligence. They are unlike traditional information analysts and data stewards or report writers, who had distinct roles and responsibilities in the data and information domains.

 

 

 

 

 

The wielder, not the axe, propel plunder aplenty

By Mervyn Mooi, director at Knowledge Integration Dynamics (KID)

Business intelligence is a fairly hot topic today – good news for me and my ilk – but that doesn’t mean everything about it is new and exciting. The rise and rise of BI has seen a maturation of the technologies, derived from a sweeping round of acquisitions and consolidations in the industry just a few years ago, that have created something of a standardisation of tools.

business-dashboard-types

(image not owned by KID)

We have dashboards and scorecards, data warehouses and all the old Scandinavian-sounding LAPs: ROLAP, MOLAP, OLAP and possibly a Ragnar Lothbrok or two. And, like the Vikings knew, without some means to differentiate, everyone in the industry becomes a me-too, which means that’s what their customers ultimately get. And that makes it very hard to win battles.

 

Building new frameworks around tools to achieve some sense of differentiation achieves just that: only a sense of differentiation. In fact, even when it comes to measurements, most measures, indicators and references in BI today are calculated in a common manner across businesses. They typically use financial measures, such as monthly revenues, costs, interest and so on. The real difference, however, comes in preparing the data and the rules that are applied to the function.

 

A basic example that illustrates the point: let’s say the Vikings want to invade England and make off with some loot. Before they can embark on their journey of conquest they need to ascertain a few facts. Do they have enough men to defeat the forces in England? Do they have enough ships to get them there? Do they know how to navigate the ocean? Are their ships capable of safely crossing? Can they carry enough stores to see them through the campaign or will they need to raid settlements for food when they arrive? Would those settlements be available to them? How much booty are they likely to capture? Can they carry it all home? Will it be enough to warrant the cost of the expedition?

 

The simple answer was that the first time they set sail they had absolutely no idea because they had no data. It was massively risky of the type that most organisations aim to avoid these days. So before they could even begin to analyse the pros and cons they had to get at the raw data itself. And that’s the same issue that most organisations have today. They need the raw data but they don’t need it, in the Viking context, from travellers and mystics, spirits and whispers carried on the wind. It must be good quality data derived from reliable sources and a good geographic cross-section. And in preparing their facts, checking they are correct, that they come from reliable sources, that there has been case of broken telephone, that businesses will truly make a difference. Information is king in war because it allows a much smaller force to figure out where to maximise its impact upon a potentially much larger enemy. The same is true in business today.

 

Before the Vikings could begin to loot and pillage they had to know where they could put ashore quickly to effect a surprise raid with overwhelming odds in their favour. In business you could say that you need to know the basic facts before you drill down for the nuggets that await.

 

The first Viking raids grew to become larger as the information the Vikings had about England grew. Pretty soon they had banded their tribes or groups together, shared their knowledge and were working toward a common goal: getting rich by looting England. In business, too, divisions, units or operating companies may individually gain knowledge that it makes sense to share with the rest to work toward the most sought-after plunder: the overall business strategy.

 

Because the tools and technologies supply common functionality and businesses or implementers can put them together in fairly standard approaches as they choose, the real differentiator for BI is the data itself and how the data is prepared – what rules are applied to it before it enters the BI systems. Preparation is king.

 

These rules ultimately differentiate information based on wind-carried whispers or reliable reports from spies abroad. Which would you prefer with your feet on the deck?

 

Contact

 

Knowledge Integration Dynamics, Mervyn Mooi, (011) 462-1277, mervyn.mooi@kid.co.za

Thought Bubble, Jeanné Swart, 082-539-6835, jeanne@thoughtbubble.co.za