Data mining is the process of systematically examining and analyzing large amounts of data to identify patterns, trends, and relationships. Since this involves Big Data, statistical and computer-aided methods are used to process the data and extract insights.
In so-called mining (extraction), the focus is on gaining insights. The aim is therefore to generate knowledge from a large deposit of structured and unstructured data. Data collection is an important prerequisite, but the actual analysis is the core process of data mining and takes place iteratively.
Step by step: In the data mining process, data is first collected and selected. In the second step, this data is cleaned, i.e. incomplete data records are added or deleted. Then the data is prepared for the actual analysis, for example, it is converted into the correct format. Now data mining, the actual analysis of the data, takes place. Here, methods of multivariate statistics, cluster analyses, association analyses, regression analyses, text mining or outlier detection (identification of errors, inconsistent data records) are used. Finally, an expert checks whether the desired goals have been achieved and evaluates the detected patterns. This process is repeated. Usually, the data obtained becomes more accurate with each run.
Data mining, machine learning and big data are often used as buzzwords and the actual meanings of the words become blurred. In fact, at their core, all terms describe the same goal: to extract knowledge from data and make it usable. Big Data describes large amounts of data that are too comprehensive to be examined using classic analysis methods, which is why data mining and machine learning, among others, are used. The term data mining is used when patterns are to be read out of large data volumes with the aid of statistical methods and correlations are to be recognized. Machine learning is the correct term when intelligent algorithms are used that automatically recognize such patterns and use this knowledge to solve problems independently.