Clarification for Diff for Clustering and Classication?
A basic question can some one pls ellaborate what was the main different between clustering and classificaiton in data mining and how can we achive
I wonder why people compare these two methods. Is it due to both the word started with C? These two methods are completely different. When a number observation belongs to certain number of class, classification methods finds rules to identify the class. A training data(where class is know) set is required to extract the rule which will be used to classify obs where class in not known.
On the other hand clustering algorithm identify some group or segment based on characteristic of the observation. main focus is within a group observation will be identical but different from observation from other group.
clustering falls under unsupervised analysis, you are searching for patterns in data its exploratory data analysis.
Whereas classification falls under supervised analysis were you know the number of classes or groups and try to classify data. there is so much info that you can google.
As it was previously mentioned, Cluster Analysis is a technique that allows us to classify observations based on several characteristics, then grouping individuals whose values in these classification variables behave somehow similar.It is appropriate when you are not certain about the number of groups, or even if there are groups within your observations. It is based on distance/dissimilarity measures between observations, which are used to determine proximity between objects. Cluster analysis can also be used to group variables. There may be more than 100 variants for this technique, based on different dissimilarity measures and grouping algorithms.
Classification analysis include a series of techniques designed to predict membership of an object between a series of known and defined groups. It can be used to determine the group of observations not yet classified. For classification there are several techniques, which include Discriminant Analysis, logistic regression and classification trees (there are more, I just think these are the most well known). Discriminant Analysis is based on models, it is in fact the “opposite” to a MANOVA analysis. Logistic regression is also based in models although usually estimated through maximum likelihood. Classification trees are procedures based in partitioning observations according to criterions that help group similar individuals.
Classification is a deterministic method, of sorting records or events according to pre-defined qualities and values. It’s ancient. If there’s an error in the data, it will cause an error in the results. Classification is a method of sorting out standardized data tables.
Clustering takes into account a large number of parameters at once. It deals with multi-variant space and fuzzy and stochastic data. It takes the best relation (not necessarily 100% right) thus can overcome data errors. Clustering has numerous methods. The more automatic (self diagnostic) is the method the better it is.
I now post my TRADING ALERTS
into my personal FACEBOOK ACCOUNT
. Don't worry as I don't post stupid cat videos or what I eat!