Web Mining or quant analytics: trend or fad?
Web-Mining (WM) techniques have been developed in order to analyze Big Data spread through the internet. WM remains a mysterious, almost occult field for any neophyte. In the popular imagination it is often perceived as an entanglement of algorithms, artificial intelligence, machine learning and sophisticated statistics from which, one does not always grasp the depth and scope, but thinks it may be useful in order to find “a diamant in a coal pile”. Paradoxically, in an increasingly rationalized world, technology appears as a blue chip.
WM is appealing because it is so complex and cabbalistic that it must necessarily be a reliable method to obtain relevant results. WM appears as a recipe which extracts totally new, ready-to-use knowledge from databases. As in the words of Greening, “WM is simultaneously a minefield and a gold mine”, depending on how the organization implements it.
What evolution for WM? Is it here to stay and is it useful at all? What would be its benefits for businesses?
Server logs are quite structured, internal data sinks. Such a doughnut of data does not make it Big in my opinion. Call detail records, dialer stats from inbound/outbound call centres fall into the same category but in content alot more exciting, personalized and in nature more bricks and mortar. Without recognizing the intrinsic complexity of the data itself, we are short changing the concept of Big Data. Consider 120mil misspelt and polluted address records in a country with 11 official languages and only 2 in 32 individuals are capable of producing their own address correctly? Unstructured, external data makes big Big.
For the mere fact that Server Logs are structured and internal, mining it is a fad and cannot be distinguished from call detail records that have been around forever! What cannot be a fad is little less data, which has an NP-Complete character in its resolve of correctness.
OK server logs (log files) are generally rather structured data which are used for analyzing usage patterns and users’ behaviors on the web. Analyzing log files is one task of web data mining, called web usage mining. But it is also possible to mine web page contents such as texts, images, videos, audio files. (web content mining) or mine hyperlink structures (web structure mining), the former being by definition far less structured than the latter, not to mention blog mining, collaboratire filtering, etc.
I see WM as a definite trend over in the UK especially within the high street banks. The volumes are challenging for retail banks (although nothing compared to their capital market brothers), one bank has an incremental of just over 1TB per month. Analytics are normally predictive with KXEN as the most common toolset.
@Frans – Great CDR insight. I recall last decade when I spent most of my time building out data warehouses for the likes of Vodafone across the global and Turkcell. Too much time was spent on party analysis to the detriment of packet data. Packet data is the Big data and akin to weblogs….
At iCrossing a major part of our business is web mining combined with strategic marketing consulting based on log analysis. Nothing tells you more about your customer than web logs. What they are looking for, why they are at your site, how much they spent on what… There is really nothing more essential for any Internet-based business than web mining and analytics.