Imagine today’s world without a search engine.Every one starts running all around for information and it will be so difficult for us to get information.Search engines plays a vital role in today’s scenario.So lets now see what actually a search engine does when a search term is entered by user.The three main steps are
Crawling means traversing the worldwide web and getting the URL’s and related data from it. Here the text from HTML content is separated and the URL is retrieved which are stored in an storage
Indexing refers to a dictionary structure means which contains a keyword and a related URL in it just as dictionary contains a word and meaning.Here we need to store data in a dictionary structure.For this we can make use of Binary search tree or AVL Tree or Hash Tables or Linked Lists.
Retrieving refers to extracting the data based on the key word entered by the user.
So these are the three main steps that a search engine does.
So in crawling when the text is extracted from HTML tags it is sent to parser where each word is tokenised from the text we retrieved.This is done by the parser using HTTP tokeniser. Based on this,key words are checked for that website and rank is assigned basing on frequency of that word.
Finally when a user enters a search term the keywords and values are compared in the dictionary and respected URL’S are shown to the user.
One important point to remember is that crawling is an continuous process it never stops since a search engine should always know about the changes in world wide web.
Image below depicts the working of an search engine.
Image that shows the usage of data structures and algorithms in search engine development