How Google Search Works ?

Google search engine is undoubtedly most widely used search engine. It was founded by Larry Pageand Sergey Brin. We must have the knowledge of basic working and methodology used by google search engine.  I have explained the things in very simple words.  Read Carefully

Overview :

Okay lets assume , you wanna design a little search engine that would search the requested key words in  few websites (say 5 websites) ,So what would be our approach ? First of all, we will store the contents that is webpages of that 5 websites in our database. Then we will make an index including the important part of these web pages like titles,headings,meta tags etc. Then we would make a simple search box meant for users where they could enter the search query or keyword. User’s entered query will be processed  to match with the keywords in the index and  the results would be returned accordingly. We will return user with list of the links of actual websites and the preference to those websites will be given to them using some algorithm.   I hope the basic overview of  working of search engine is clear to you.
Now read more regarding the same.
A web search engine works basically in the following manner. There are basically three parts.
1. Web Crawling 
2. Indexing 
3. Query processing or searching
1. First step of working of search engine is web crawling. A web crawler or a web spider is a software that travels across the world wide web and downloads,saves webpages. A web crawaler is fed with URLs of websites and it starts proceeding. It starts downloading and saving web pages associated with that websites. Wanna have feel of web crawaler. Download one from here. Feed it with links of websites and it    will start downloading  webpages,images etc associated with those websites. Name of google web crawler is GoogleBot.  Wanna see the copies of webpages saved in google database ? (actually not exactly)
Lets take example of any website , say http://www.wikipedia.org

Do this -:

Go to google. and  search for ‘wikipedia’ Hopefully you would get this link on top.
Click on the ‘cached’ link as shown.
OR
Directly search for ‘cache:wikipedia.org’
Then read the lines at top the page you got and things would be clear to you.
2. After googlebot has saved all pages, it submits them to google indexer. Indexing means extracting out words from titles,headings,metatags etc.The indexed pages are stored in google index database. The contents of index database is similar to the index at the back of your book. Google ignores the common or insignificant words like as,for,the,is,or,on (called as stop words) which are usually in every webpage. Index is done basically to improve the speed of searching.
3. The third part is query processing or searching. It includes the search box where we enter the search query/keyword for which we are looking for. When user enters the serach query, google matches the entered key words in the pages saved in indexed database and returns the actual links of webpages from where those pages are reterived. The priority is obviously given to best matching results. Google uses a patented algorithm called PageRank that helps rank web pages that match a given search string.
The above three steps are followed not only google search but most of the web search engines.Ofcourse there are many variations but methodology is same.
What is Robots.txt ?
Web Administrators do not the web crawlers or Web spiders to fetch every page/file of the website and show the links in search results.Robots.txt is a simple text file meant to be placed in top-level directory of the website which contain the links that web administrators do not want to be fetched by web crawlers. The first step of a Web Crawler is to check the content of Robots.txt

Example of contents of Robots.txt
User-agent: * //for web crawlers of all search engines

Disallow:/directory_name/file_name //specify a file of particular dir.
Disallow:/directory_name/  //all files of particular dir.

You can see robots.txt of  websites (if exists). Example http://www.microsoft.com/robots.txt

Advertisements

The Biggest problem with Google+

Its about Google+.

Its been around 2 weeks since google+ has been launched and the battle between who is better,who is secure blah blah has started. Well its a never ending debate on weather google is better of Facebook or Microsoft or any other company. Recently my friend Harneet singh posted a status on his facebook wall that google+ better and now time to shifting . I jumped into commenting against him and favouring facebook. I was quickly analysing his comments and building up mine. During this discussion I found an intresting thing that can b a negative point about the google+ Circles which lies at the core of this new social app.
Lets start with wat google defines its Circles with- “You share different things with different people. But sharing the right things with the right people shouldn’t be a hassle. Circles makes it easy to put your friends from Saturday night in one circle, your parents in another and your boss in a circle all on his own – just like in real life.”Impressive and cool defination . In simple words they are trying to build up the concept of groups through circles. Just like there are groups in Facebook similarly G+ has circles. But the issue is that the circles are “one way” . You dont know in which circle I have added you and I dont know in which circle you have added me. But the actual meaning of group is that every person of the group is aware of every other person within it . They can see what other members have shared . So i feel here lies a drawback . Though this issue can be resolved easily if you have played well with G+ .
I dont know weather google has any plans to launch any other group feature in the coming time or not but I feel they are failing in the task of presenting a symmetric grouping through circles.

Google Chrome 9 Released for Developers

Google has officially released Chrome 9 version of its web browser in the Developer Channel for Windows, Mac OS X and Linux platforms. The new Google Chrome 9.0.570.0 build is meant for the developer channel and arrives with a number of changes listed here.

This new version comes with some security fixes and several minor changes to make it run faster. No new features are added at this moment in the experimental Chrome Labs.

Last month, Chrome 8.0.552.xx beta version was released in developer channel. With arrival of Chrome 9 we believe that a stable build of Chrome 8 would be released in the coming weeks.

Google is aiming to release Chrome 9 by November 29. While Opera 11 Alpha is already available to users for testing out new changes like Opera Extensions and HTML5 server-sent events. Hopefully, by the end of this year, we might get to use release candidates of Firefox 4.