A search engine is a system that uses a specific computer program to collect information from the Internet according to a certain strategy, organizes and processes the information, provides retrieval services to users, and displays the relevant information retrieved by users to users. A search engine consists of four parts: a search engine, an indexer, a retriever, and a user interface. The function of the search engine is to roam in the Internet, discover and collect information; the function of the indexer is to understand the information searched by the search engine, extract index items from it, and use them to represent documents and generate index tables for document libraries; the function of the retriever is to quickly check out documents in the index library according to the user’s query, evaluate the relevance of the document and the query, sort the results to be output, and implement a certain user relevance feedback mechanism; the function of the user interface is to input user queries, display query results, and provide user relevance feedback mechanisms.
The operating mechanism of the search engine:
Since the information resources on the Internet are complex and diverse, and the user’s demand for information is specific, in order to quickly and effectively query the required information from thousands of websites, users can search and locate the information they need through search engines, find the site where the required information is located, and then go to the site to find the required information. The operation mechanism of the search engine mainly includes four aspects: collecting pages, analyzing pages, sorting pages and querying keywords.
1. Collecting pages
Including pages refers to the process in which the search engine crawls relevant pages on the Internet according to certain rules through an automated retrieval program, and then stores the pages in a relevant database. It is the basis for the search engine to carry out various tasks.
2. Analyzing pages
On the basis of collecting pages, the search engine indexes the collected original pages according to certain requirements to locate the collected pages, and then extracts and analyzes the text information of the collected pages to obtain keywords and index them, thereby forming a corresponding relationship between pages and keywords. Finally, the search engine reorganizes the keywords and finally forms a reverse list of the corresponding relationship between keywords and pages, so that it can quickly link to the corresponding page based on the keywords.
3. Sorting pages
The search engine combines the internal and external factors of the page (webpage address, encoding type, keywords and location included in the page content, generation time, page size, link relationship with other web pages, etc.), calculates the relevance of the page to a certain keyword based on a certain relevance algorithm, and sorts the pages according to the relevance value to form a ranked list of pages related to the keyword.
4. Query keywords
After receiving the user’s query request, the search engine begins to segment the query information and match it with the keywords of the included pages, and then returns the ranked list of pages containing link addresses, content summaries, etc. to the user.