Big data analysis technology refers to improving existing data mining and machine learning technologies, developing new data mining technologies such as data network mining, special group mining, and graph mining, breaking through big data fusion technologies such as object-based data link similarity links, and breaking through field-oriented big data mining technologies such as user interest analysis, network behavior analysis, and emotional semantic analysis.
Data mining is the process of extracting implicit, unknown, but potentially useful information and knowledge from a large amount of incomplete, noisy, fuzzy, and random actual application data.
Data mining involves many technical methods, which can be divided into classification or prediction model discovery data summary, clustering, association rule discovery, sequence pattern discovery, dependency or dependency model discovery, anomaly and trend discovery, etc. according to the mining task.
According to the mining object, it can be divided into relational databases, object-oriented databases, spatial databases, temporal databases, text data sources, multimedia databases, heterogeneous databases, legacy databases, and the World Wide Web.
From the perspective of mining tasks and mining methods, focus on breakthroughs:
① Visual analysis. Data visualization is the most basic function for both ordinary users and data analysis experts. Data visualization can let the data speak for itself and let users intuitively feel the results.
② Data mining algorithm. Visualization is to translate machine language for people to see, and the language used in data mining is the native language of the machine. Segmentation, clustering, outlier analysis and various other algorithms can allow us to refine data and mine the value of data. These algorithms must be able to cope with the amount of big data and have a high processing speed.
③ Predictive analysis. Predictive analysis allows analysts to make some forward-looking judgments based on the results of visualization analysis and data mining.
④ Semantic engine. The semantic engine needs to involve sufficient artificial intelligence so that it can actively extract information from the data. Language processing technologies include machine translation, sentiment analysis, public opinion analysis, intelligent input, question-and-answer systems, etc.
⑤ Data quality and data management. Data quality and data management are the best practices of management. Processing data through standardized processes and machines can ensure that a preset quality analysis result is obtained.