A Light in the Tunnel

RSS

A Light in the Tunnel

Story by Mark Whitehorn, 11-11-2008, 0 comment

It’s all very well to talk about “democratising” business intelligence, and many of today’s tools do make it easy to visualise data, but how many people really know how to get to the bottom of data mining?
Microsoft’s second Business Intelligence conference took place in Seattle in early October and began with a keynote by Stephen Elop (president of the Microsoft Business Division). On a day when the stock markets on both sides of the Atlantic were in free fall, his remarks on new economic challenges will have struck a resounding chord with many of the 2,500 delegates. In addition, either by unnervingly good planning or great fortune, the keynote speaker on the second day was Ben Stein. His name may be familiar as that of the actor-comedian who played the mind-numbingly boring teacher in the 1986 film Ferris Bueller’s Day Off, but he is also a noted economist and presidential advisor. His keynote stood out from many of its kind for being informative, well observed, engaging and humorous.

Added Value
So both keynotes were apposite for BI – in times of economic turmoil the need for organisations to derive more value from their data and to squeeze out a little more competitive advantage is stronger than ever. According to Elop, around 75 per cent of workers have “virtually no access to meaningful BI capabilities” and he spoke of the “democratisation” of BI, giving users at all levels easy access to the data they need to do their jobs through a single integrated environment. Ted Kummert (corporate VP of the Data and Platform Storage Division) then announced a series of new developments towards this goal which will appear in an update of SQL Server due in 2010. Microsoft is at pains to make clear that this update (codenamed “Kilimanjaro”) is not the next major release of SQL Server and that its focus is much more on BI than on the relational engine itself.

One part of this release (codenamed “Gemini”) is designed to make it much easier for end users to mix and match data for analytical purposes. This can be done using Excel as the front end. Yes, I know that Excel is not famous for handling large sets of data, but the good news is that part of this technology is a new in-memory column store (a data handling tool); the demo I saw was manipulating 100 million rows of data pretty much instantaneously.

Another major component of Kilimanjaro will be “Madison”. You may remember that Microsoft bought DATAllegro a while ago. Madison is essentially the result of crossing SQL Server with DATAllegro and the bottom line is that SQL Server can now comfortably handle data warehouses in the hundreds of terabytes range.

Tibco Spotfire Latest
It’s obviously a good time of year for announcements because Tibco has made one too: Spotfire is now available as version 2.2. Spotfire has gained an enthusiastic following for its ability to analyse data with remarkable speed and elegance.

The changes from the previous version, 2.1, are significant; in fact, many companies would have announced a new version and not just a point release. Tibco Spotfire’s Web Player 2.2 has improved performance in many areas: opening a Web Player session is now described as “instant”, the memory footprint has been vastly reduced for large user applications, and interactivity with applications has been made between 30 and 200 per cent faster for certain operations, such as the page change filter.

Light in the Tunnel fig1

Figure 1: Tibco Spotfire’s new 3D data visualisation is demonstrated in an analysis of bore holes

Data visualisation is a vital element in conveying data’s meaning to a wider audience and it’s a major strength of the latest Spotfire product. One of the most important enhancements is in 3D data visualisation; the swirling green graphic in the screen shot shows an analysis of bore holes (Figure 1). Also illustrated is the specific ability to map networks (Figure 2), something that’s especially useful for social networking analysis. The API has been extended for better performance and more interactive user experience from custom tools and calculations.

Data Mining for All
Those who use BI techniques have happily embraced both relational and multidimensional data structures and deal with the huge demand for reporting without drama – but it appears that we don’t mine our data as much as we could. Data mining is the process by which software can be used to find and illuminate complex correlations, trends and anomalies in large blocks of data. An example is worth reams of explanation so there’s one in the boxout from a data mining project I did for an insurance company. (Usually if a company gains a competitive advantage, it doesn’t want the results published. However the company concerned gave permission for publication, as long as I didn’t identify it or the country in which the work was done.)

Light in the Tunnel fig2

Figure 2: With Spotfire, you can analyse networks, social and otherwise

When data mining first appeared, 15 to 20 years ago, it was seen as – and indeed was – a very difficult undertaking. Since then it has become much easier. However, during the BI conference mentioned above, it became apparent that it is still a wildly underused technology.

As an example, I went to an excellent talk by the highly intelligent and entertaining Donald Farmer. He asked the attendees how many had ever tried data mining. About half the audience of about 200 put their hands up. Then he asked how many had a data mining system in production. Six hands. Finally, he asked how many people considered themselves data mining experts. Just three. Remembering that this was not a random sample of people or even of database enthusiasts, but a fully blown BI conference, the results surely suggest that the technology is underused.

If you use SQL Server the tools are ready and waiting in the box; they’ve been made easy to use and are even wizard-driven. Mining clean data can yield hugely useful information, so why not fetch your canary and give it a try.

DATA MINING IN THE REAL WORLD
Towards the end of a policy, someone from a call centre rings the policy holder and tries to persuade them to renew. The insurance company knows a great deal about the policy holder (name, address, gender, age, etc) and about the caller (an employee of the company) and about the call itself (time of initiation, length, date, day of week, and so on).

The insurance company was interested in the factors that led to
a single binary result – renewal or non-renewal. So the data was run through a data mining algorithm which looked at every known factor and cross-correlated it with every other factor.

About 50 factors were recorded, so the number of pair-wise correlations was very high (49+48+47+46...+1 = 1,225). However, the algorithm didn’t just look at them in pairs; it looked at all possible combinations of factors, which yielded a much bigger number.

Now, to us humans, some of the correlations that the data mining algorithm flagged as highly significant were obvious. The top one was the cost of the policy compared with its cost in the previous year (no surprises there). But we weren’t interested in those; we were interested in the significant ones that weren’t obvious. As the data mining was processing we tried to guess what would be the most important non-obvious correlation that the algorithm would spot. My guess, working on the assumption that people like talking to someone with the same accent, was the tie-up between the birthplace of the caller and the birthplace of the customer. I was wrong. In fact, none of us guessed the right answer. Now is the time to stop reading and make a guess…

It turned out that the most important factor by far, apart from the obvious ones, was the closeness of the correlation between the ages of the two people involved on the call. As is often the case, as soon as you hear the answer, you can back-extrapolate to the possible cause. Not only can there be a lack of trust between age groups, but subtle differences become apparent in language use and cultural references.

Even more interestingly, the relationship was asymmetrical. Older customers were more distrustful of young callers than the other way around. But it was still clear that the best results occurred when the ages were close.

Of course, other factors also came into play. These included gender and, way down the list but still significant, regional accent. So my guess wasn’t entirely wrong.


SHARE THIS.

Post new comment





500 characters left

Verification Image

SIGN UP.

Sign up to receive the latest news and updates from Server-Management via email.

News & Features Feed
Viewpoints Feed
FOLLOW US.
OUR SPONSOR.
Top 10 Most Popular Articles
Top 5 Jobs
IT Manager - ITIL, Infrastructure, Operations - Kent
Posted:
2010-03-12
Location:
Kent, South East
Salary range:
45000 - 55000
Salary period:
year
Description:

We urgently need an experienced IT Manager with strong people management skills (team of 15) and with a solid appreciation of IT infrastructures and IT operations to join the management team within this leading organisation. The remit will be to be drive ITIL best practice across the IT infrast... read more

IT Manager-WMS
Posted:
2010-03-12
Location:
Derbyshire, Derbyshire
Salary range:
55000 - 60000
Salary period:
year
Description:

On behalf of a large blue chip client we are looking for an IT Manager with an in depth understanding of WMS remote data capture, warehouse automation and the “black box technology” utilised to provide seamless interfaces. This is a challenging role which requires a number... read more

IT Manager
Posted:
2010-03-12
Location:
127, UK, London, London
Salary range:
60000 - 70000
Salary period:
year
Description:

My London based legal client is looking to recruit an IT manager. The role of the IT manager will be both technically hands on and a managerial role, with 3 direct reports. The IT manager will have to present business cases to the partners, lead the current team, bring new ideas and vision for ... read more

IT Technician
Posted:
2010-03-12
Location:
Sheffield, South Yorkshire
Salary range:
20000 - 25000
Salary period:
year
Description:

PLEASE DO NOT APPLY UNLESS YOU HAVE A LEGAL BACKGROUND. IT Technician (Legal) Sheffield £20-25k The Job Role: We are looking for a network administrator who will be able to maintain and support the systems our client has in place providing services to their team. The Systems Administ... read more

IT Support - Telephony Manager
Posted:
2010-03-12
Location:
Basildon, Essex
Salary range:
19000 - 20000
Salary period:
year
Description:

We our looking for an IT Support + Telephony Manager to manage the IT Support function to ensure that all objectives are met on a daily, weekly and monthly basis. Our Client is a customer focused business, entrepreneurial and flexible organisation whose people are seasoned in the various discip... read more


Want to advertise here? Follow me!