Think out of the Box, but not out of the Cube, Series
Online analytical processing, or OLAP, has played a key role in the analyst toolbox for navigating dimensional data sets and identifying trends, anomalies, and root causes. The technology driving OLAP has evolved considerably since the 1990s, providing greater visualization, scalability, and analytical horsepower. Yet, more is on the way, including in-memory analytics, improved visualization, increased scalability and performance, and integration with appliances and very large databases.
The future of data mining lies in predictive analytics. The technology innovations in data mining since 2000 have been truly Darwinian and show promise of consolidating and stabilizing around predictive analytics. Variations, novelties and new candidate features have been expressed in a proliferation of small start-ups that have been ruthlessly culled from the herd by a perfect storm of bad economic news. Nevertheless, the emerging market for predictive analytics has been sustained by professional services, service bureaus (rent a recommendation) and profitable applications in verticals such as retail, consumer finance, telecommunications, travel and leisure, and related analytic applications. Predictive analytics have successfully proliferated into applications to support customer recommendations, customer value and churn management, campaign optimization, and fraud detection. On the product side, success stories in demand planning, just in time inventory and market basket optimization are a staple of predictive analytics. Predictive analytics should be used to get to know the customer, segment and predict customer behavior and forecast product demand and related market dynamics. Be realistic about the required complex mixture of business acumen, statistical processing and information technology support as well as the fragility of the resulting predictive model; but make no assumptions about the limits of predictive analytics. Breakthroughs often occur in the application of the tools and methods to new commercial opportunities.
Unfulfilled Expectations: In addition to a perfect storm of tough economic times, now improving measurably, one reason data mining technology has not lived up to its promise is that "data mining" is a vague and ambiguous term. It overlaps with data profiling, data warehousing and even such approaches to data analysis as online analytic processing (OLAP) and enterprise analytic applications.
When high-profile success has occurred (see the front-page article in the Wall Street Journal, "Lucky Numbers: Casino Chain Mines Data on Its Gamblers, And Strikes Pay Dirt" by Christina Binkley, May 4, 2000), this has been a mixed blessing. Such results have attracted a variety of imitators with claims, solutions and products that ultimately fall short of the promises. The promises build on the mining metaphor and typically are made to sound like easy money - "gold in them thar hills." This has resulted in all the usual dilemmas of confused messages from vendors, hyperbole in the press and unfulfilled expectations from end-user enterprises.
Common Goals: The goals of data warehousing, data mining and the emerging trend in predictive analytics overlap. All aim at understanding consumer behaviour, forecasting product demand, managing and building the brand, tracking performance of customers or products in the market and driving incremental revenue from transforming data into information and information into knowledge. However, they cannot be substituted for one another. Ultimately, the path to predictive analytics lies through data mining, but the latter is like the parent who must step aside to let the child develop her or his full potential. This is a trends analysis, not a manifesto in predictive analytics. Yet the slogan rings true, "Data mining is dead! Long live predictive analytics!" The center of design for cutting-edge technology and breakthrough commercial business results has shifted from data warehousing and mining to predictive analytics. From a business perspective, they employ different methods. They are positioned in different places in the technology hierarchy. Finally, they are at different stages of growth in the life cycle of technology innovation.
Technology Cycle: Data warehousing is a mature technology, with approximately 70 percent of Forrester Research survey respondents indicating they have one in production. Data mining has endured significant consolidation of products since 2000, in spite of initial high-profile success stories, and has sought shelter in encapsulating its algorithms in the recommendation engines of marketing and campaign management software. Statistical inference has been transformed into predictive modelling. As we shall see, the emerging trend in predictive analytics has been enabled by the convergence of a variety of factors.
Technology Hierarchy: In the technology hierarchy, data warehousing is generally considered an architecture for data management. Of course, when implemented, a data warehouse is a database providing information about (among many other things) what customers are buying or using which products or services and when and where are they doing so. Data mining is a process for knowledge discovery, primarily relying on generalizations of the "law of large numbers" and the principles of statistics applied to them. Predictive analytics emerges as an application that both builds on and delimits these two predecessor technologies, exploiting large volumes of data and forward-looking inference engines, by definition, providing predictions about diverse domains.
Methods: The method of data warehousing is structured query language (SQL) and its various extensions. Data mining employs the "law of large numbers" and the principles of statistics and probability that address the issues around decision making in uncertainty. Predictive analytics carries forward the work of the two predecessor domains. Though not a silver bullet, better algorithms in operations research, risk minimization and parallel processing, when combined with hardware improvements and the lessons of usability testing, have resulted in successful new predictive applications emerging in the market. (Again, see Figure 1 on predictive analytics enabling technologies.) Widely diverging domains such as the behaviour of consumers, stocks and bonds, and fraud detection have been attacked with significant success by predictive analytics on a progressively incremental scale and scope. The work of the past decade in building the data warehouse and especially of its closely related techniques, particularly parallel processing, are key enabling factors. Statistical processing has been useful in data preparation, model construction and model validation. However, it is only with predictive analytics that the inference and knowledge are actually encoded into the model that, in turn, is encapsulated in a business application.
This results in the following definition of predictive analytics: Methods of directed and undirected knowledge discovery, relying on statistical algorithms, neural networks and optimization research to prescribe (recommend) and predict (future) actions based on discovering, verifying and applying patterns in data to predict the behavior of customers, products, services, market dynamics and other critical business transactions. In general, tools in predictive analytics employ methods to identify and relate independent and dependent variables - the independent variable being "responsible for" the dependent one and the way in which the variables "relate," providing a pattern and a model for the behavior of the downstream variables.
In data warehousing, the analyst asks a question of the data set with a predefined set of conditions and qualifications, and a known output structure. The traditional data cube addresses: What customers are buying or using which product or service and when and where are they doing so? Typically, the question is represented in a piece of SQL against a relational database. The business insight needed to craft the question to be answered by the data warehouse remains hidden in a black box - the analyst's head. Data mining gives us tools with which to engage in question formulation based primarily on the "law of large numbers" of classic statistics. Predictive analytics have introduced decision trees, neural networks and other pattern-matching algorithms constrained by data percolation. It is true that in doing so, technologies such as neural networks have themselves become a black box. However, neural networks and related technologies have enabled significant progress in automating, formulating and answering questions not previously envisioned. In science, such a practice is called "hypothesis formation," where the hypothesis is treated as a question to be defined, validated and refuted or confirmed by the data.
For more visit my Blog: