Data Mining is a science of unearthing hidden patterns and relationships within your data to help you take better business decisions. It can help you in spotting sale trends, developing marketing campaigns or predicting customer churn. It has several applications in today’s business, which is required to deal with a large amount of data. Data mining requires a Data Warehouse as a source of data. Principles of data mining have been around for a while, but they have gained prominence with the advent of Big Data and Data Analytics.
Data mining can be performed on varied data sets starting with conventional relational database (RDBMS), raw text data, key-value stores, or document database. Clustered databases such as Hadoop help in data mining by providing a platform to store & access data, which is different from traditional structured data.
There are several techniques used in data mining most of which are based on some form of a statistical model. Most commonly used techniques are given below
1. Association
Association is one of the most common and easy to understand the technique of data mining. Here you discover an association between two or more similar items, to know patterns. In retail, by analyzing the buying pattern of customers, you may find that a customer generally buys an item X when he buys an item Y. This analysis helps you to recommend that item Y when a customer buys an item X. Association technique helps in increasing sale by suggesting another relevant item when an item is purchased.
2. Classification
Classification is based on grouping of customers, items or objects based on matching attributes or characteristics. Asan example, you can classify cars based on their utility such as sedans, SUVs, 4×4 convertibles, etc. Classification is a supervised machine learning technique, where a training set and correctly defined observations are available. Classification technique can be used for creating targeted marketing campaigns for customers based on their classification.
Decision tree and k-nearest neighbor classifier are the most popular classification techniques used by industry.
3. Clustering
Clustering is a method of grouping objects in such a way that objects with similar features come together, and objects with dissimilar features go apart. It is a common technique used for statistical data analysis used in data mining.
Clustering belongs to un-supervised machine learning technique. At a simple level, it uses one or more attributes to identify clusters of similar results. Clustering can be also used to verify your assumptions on clusters which you think exist.
A telecom company may use clustering technique to decide where to erect its towers so that its users get optimum signal strength. A health care company may decide on the location of emergency care wards based on most accident-prone areas in a region.
K-means Clustering and Hierarchical clustering are two common clustering algorithms used in data mining.
4. Prediction
Prediction technique can be used to predict the future based on the past data. There are many diverse applications of this technique. It can help in predicting the failure of a machine component to detection of fraud to predicting sales /profits.
Regression analysis is a statistical methodology that is most often used for numeric prediction.
5. Decision tree
Decision tree technique is often used with classification and prediction techniques. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed.Result is a tree with decision nodes and leaf nodes. Decision node has two or more branches pertaining to possible decisions, and leaf node represents classification or decision. Root node is the starting point of a decision tree.
Data mining is a field that offers infinite opportunities for expansion and growth of any business worldwide.