How does it work? | Search engine

How does it work? | Search engine


The first computer program for searching the Internet was Archie, created in 1990 by students from Montreal. She downloaded a list of all files from all FTP servers and build a database that you can search for file names. The first full-text search engine was “WebCrawler”, launched in 1994 and indexing resources by using robot. It allows users to search for any words located on any web page. In 1998, Larry page and Sergey Brin created the Google search engine based on their project BackRub. Their innovation was the introduction of its own PageRank algorithm, rank a web page based on the number of hyperlinks to them. Search taking into account morphology of Russian was introduced in ‘ 96 on a search engine, Altavista, was launched Rambler and Aport. And in September ‘ 97 was opened the search engine Yandex. How does the search engine — this is in the news today.

Initially, the search robot to obtain the content and the indexer to generate a searchable index. The search robot or “crawler” is a program that automatically goes through all the links found on the page and highlights them. Based on a pre-determined list of addresses, it searches for new documents, not yet known to the search engine. Found a new page is analyzed by search system for further indexing. This deals with a special module is the indexer, which pre-divides the page into parts by applying lexical and morphological algorithms. Data about web pages are stored in index database. The index allows you to quickly find information in response to user requests.

Search engine, in turn, works with the indexer. When a user enters a query into a search engine, it checks its index and provides a list of the most suitable web pages.

Analysis of the request begins with the definition of the language, as one and the same word in different languages can mean different things. Therefore, the system draws attention to the alphabet and language of the user interface. Then the search moves on to morphology and determines which parts of speech are written speech. It allows to find documents containing different forms of the same words. Also, the search engine identifies query items geographical names, names of people, names of organizations, and to consider all possible options, complements the request for a new wording with the same meaning. In addition, the search engine automatically expands or displays the results as erroneous and corrected queries.

Most search engines uses ranking methods and machine learning to deduce at the beginning of the list of the “best” results.

In the advanced search engines of the neural network transforms the search queries and the titles of the web pages into groups of numbers and semantic vectors. They can be compared with each other and give more accurate results.

There are search algorithms that compare the vectors of queries and web pages entirely — and not only their headers. This allows the system to understand the meaning of pages and right to take them, when people describe a search in your own words. For this neural network converts the text of the pages in the semantic vectors in advance at the stage of indexing. And when a person asks the query, the algorithm compares the query vector with the already known vectors of the pages.