Big data propels exploratory data analysis to the fore Part II

Analysis constitutes inspecting, cleaning, transforming and modelling data in a bid to derive useful business conclusions. Exploration of the data is a prerequisite then, because it seeks to discover patterns and links between data sets. It is often a discovery that relies on visual methods. According to John Tukey’s proposition it encourages statisticians to explore data with a view to formulating hypotheses that may lead to new data collection and experiments. It is also nothing new. Tukey first proposed exploratory data analysis in a book of the same name in 1977.


(image not owned by KID)

What is new is the fact that data technicians are grappling with the issue in the modern computing sphere that incorporates big data, which is not only a definition of size, but also one of numerous types of data. Therein lies the clue to the requisite capabilities of modern exploratory data tools. It also hints at satisfying one of the benefits of data exploration: new data collection and experiments with observable business benefits.


To be effective then the tools need to be able to search large volumes of data as well as diverse data types. They must also be easy to use since in many cases it is businesspeople who must use them. Yet they must also offer technicians the ability to model and query accordingly. They need to rapidly present useful information to people yet simultaneously offer the ability to drill deeper in search of specific information as required. Above all, they must have the ability to integrate with numerous data stores and data repositories because higher, ubiquitous bandwidth necessitates interaction on an unprecedented scale.