Web search engines are keys to the immense treasure of information. Dependency on the search engines is increasing drastically for both personal and professional use. It has become essential for the users to understand the differences between the search engines in order to attain a higher satisfaction. There is a great assortment of search engines which offer various options to the web user. Thus, it is significant to evaluate and compare search engines in the quest of a single search engine that would satisfy all the needs of the user.
The main problem is facing by people, which search engine is very useful for finding reliable, relevant and fresh results. There are plenty of search engine are available for searching but, this dissertation will cover which technology are running behind Google and Yahoo!! search engine.
Google and Yahoo! have their own algorithm for indexing the website.
In simple words a Search engine is software that searches through a database of web pages or web resources for a piece of information, keywords, concepts etc…
There are many types of Different Search Engine available in Market. for example msn, Google, Yahoo!, ask etc…
C:UsersJIMIT DOSHIDesktopnew dwnlddifferent-types-of-search-engines_4823.gif.jpg
Figure: 1 Different Search Engine
To define the concepts more descriptively we can say that “Search engine is a computer program that searches for documents containing words or phrases of interest to users .The search engine itself is a virtually powerful workstation-class machine that searches a database of information collected from the Internet. Primarily software program called robots or spiders that crawl through all the files on the Internet and download them into a searchable database .These works as indexes to the literature available on the network. In the context of the Internet, Search engines usually refer to the World Wide Web and not other protocols or areas.
Search engine is helpful for identifying sources, establishing notability, checking facts, and discussing what names to use for different things.
There are a number of search engines available on the web. Most of the Search engines provide website reviews and homepage services in addition to keyword searches.
But, in this present study two most popular search engines have been studied in terms of its available web resources with reference to Physics-India in Google and Yahoo!.
Google is one type of web search engine. When user wants to find something around the world using internet at that time Google comes into the picture. Google Search is the most-used search engine on the World Wide Web. So Google provide the entire information base on keyword which is put by the user in to search box of Google search engine. When user entry any keyword for search Google display all the related result of the keyword.
Yahoo! is one of the best known and most popular Internet portals. Originally a subject directory of sites, it now is a search engine, directory, and portal. To go to the Yahoo!! portal and main starting point, use www.Yahoo!.com.
For direct access to the search engine, usesearch.yahoo.com and for the directory use www.dir.yahoo.com. This review primarily coves the search engine features. Use the table of contents on the left to navigate this review.
Search engine technology has had to scale dramatically to keep up with the growth of the web. In 1994, one of the first web search engines, the World Wide Web Worm (WWWW) [McBryan 94] had an index of 110,000 web pages and web accessible documents. As of November, 1997, the top search engines claim to index from 2 million (WebCrawler) to 100 million web documents (from Search Engine Watch). It is foreseeable that by the year 2000, a comprehensive index of the Web will contain over a billion documents. At the same time, the number of queries search engines handle has grown incredibly too. [1]
In March and April 1994, the World Wide Web Worm received an average of about 1500 queries per day. In November 1997, AltaVista claimed it handled roughly 20 million queries per day. With the increasing number of users on the web, and automated systems which query search engines, it is likely that top search engines will handle hundreds of millions of queries per day by the year 2000. The goal of our system is to address many of the problems, both in quality and scalability, introduced by scaling search engine technology to such extraordinary numbers.[2]
With the explosive growth of World-Wide-Web (WWW), publishing document on Internet has become more popular.But how to locate what we need in the ocean of information is an increasingly important and urgent problem. To simplify the problem of getting relevant results based on the search query, the Internet search
engines were created that allowed searching a lot of information from the World-Wide-Web in the form of Web pages [3].
Search engines are among the most successful application on the Web today. They act as a system for searching the information available on the Web by automatically searching the contents of other systems and creating a database of the results [4].
The most famous search engines include AltaVista, Infoseek, Google, and MSN. They provide good searching ability by indexing more pages on the Web and maintaining the updated indices in their databases. Despite so many search engines are available to help user in finding the information of their interest, searching on the Web is not an easy task. The problem is due to the vast amount of data on
the Web and its rapid updating and growth[5].
The first Web search engine was “Wandex”, developed by the World Wide Web Wanderer in 1993. Another very early search engine, Aliweb, also appeared in 1993 and still runs today. One of the first engines to later become a major commercial endeavor was Lycos, which started at Carnegie Mellon University as a research project in 1994.
Soon after, many search engines appeared and vied for popularity. These included
WebCrawler, Hotbot, Excite, Info seek, Inktomi, and AltaVista. In some ways they competed with popular directories such as Yahoo!!. Later, the directories integrated or added on search engine technology for greater functionality.
In 2002, Yahoo!! Acquired Inktomi and in 2003, Yahoo!! Acquired Overture, which owned AlltheWeb and AltaVista. In 2004, Yahoo!! Launched its own search engine based on the combined technologies of its acquisitions and providing a service that gave pre-eminence to the Web search engine over the directory.
Before the advent of the Web, there were search engines for other protocols or uses, such as the Archie search engine for anonymous FTP sites and the Veronica search engine for the Gopher protocol.
Recent additions to the list of search engines include a9.com, AlltheWeb, Ask Jeeves, Clusty, Gigablast, Ez2Find, Teoma, WiseNut, GoHook, Walhello, Kartoo, Snap and Mamma .
Market Cover of different Search Engine :C:UsersJIMIT DOSHIDesktopnew dwnlduse of search engine chart.gif
Figure: 2 search engine market
As per above figure: 2 Google and Yahoo! cover most of the market of the world. Both have more popularity then other search engine. Google beat the Yahoo! at some level
Google was co-founded by Larry Page and Sergey Brin while they were doing their Ph. D. at Stanford University in 1998 and was officially launched in the fall of 1999. This is a straightforward engine that does not support advanced search syntax making it very easy to use and retrieves pages ranked on the basis of number of sites linking to them and how often they are visited, indicating their popularity (ibid). It claims that 97% of the users find what they are looking for.
C:UsersJIMIT DOSHIDesktopnew dwnldgoogle snapshot.png
Figure:3 Google home page overview
Its success was based in part on the concept of link popularity and PageRank. How many other web sites and web pages link to a given page is taken into consideration with PageRank, on the premise that good or desirable pages are linked to more than others. The PageRank of linking pages and the number of links on these pages contribute to the PageRank of the linked page. This makes it possible for Google to order its results by how many web sites link to each found page. Google’s minimalist user interface was very popular with users, and has since spawned a number of imitators.
Google has been estimated to run over one million servers in data centers around the world, and process over one billion search requests and about twenty-four petabytes of user-generated data every day
Instead of ranking pages, this technology uses an algorithm that follows links on a webpage to find other pages that link back to the first one and so on from page to page.
Google includes the following most important features:
Cached page archives.
Result clustered by indention.
Result displayed option, from 10-100.
“Google Search” Supports:
Implied Boolean (+)sign, (-) sign.
Double quotes (“”) for phrases.
Stop words.
“I’ m Feeling Lucky” (goes directly to top ranked site in query)
“Google scout” (bring up list of related sites)
“Uncle Sam” (Searches govt. and Milsites)
“Search within results” option
Field searching with ‘link’ only.
Yahoo! was co-founded by Stanford University Graduate students Jerry Yang and David Filo in January of 1994.Yahoo! is a subject Directory and also a commercial portal compiled by human. It is oldest as well as largest directory on the web.
C:UsersJIMIT DOSHIDesktopnew dwnldsnapshot of yahoo.gif
Figure: 4 Yahoo! search engine
Yahoo! allows the user to put a search query, its strength lies in the categories and each that can lead a user step-by-step to the desired subject category.
Yahoo! is hierarchically organized with subject catalogue or directory of the web which is browseable and searchable.
Links to various services are accomplished in two ways such as by user’s submissions and through robots that retrieve new links from known pages.
Yahoo! indexes web pages, UseNet and e-mail address.
Topic and region specific “Yahoo!”
Automatic truncation.
No case sensitivity.
The syntax that Yahoo! follows for searching is fairly standard among all search engines.
Users can browse Yahoo!! Simply by clicking on the various categories listed on each page, or can search Yahoo!! By entering a word into the search box that appears on every page in the directory. Again one can combine the two strategies and can “browse and then search” or “search and then browse.”
Yahoo!! News
User may combine any of the query syntax as long as the syntax is combined in the proper order, which is +, -, t: “”, and *.
If Yahoo! does not find any matching entries, pertaining to a query, in its main database, the query will automatically be transferred to the Inktomi database, a search engine that automatically ‘crawls’ the text of the entire web. Inktomi database contains results for literally millions of individual web pages.
Yahoo! thus looks for information in:
Yahoo!! Categories.
Websites listed in Yahoo!.
WebPages indexed by Inktomi.
google_search_strategy1_thumb.png
Figure: 5 how work Google search engine
Google Search (or Google Web Search) is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services.
The order of search results on Google’s search-results pages is based, in part, on a priority rank called a “PageRank”. Google Search provides many options for customized search, using Boolean operators such as: exclusion (“-xx”), alternatives (“xx OR yy”), and wildcards (“x * x”).
The main purpose of Google Search is to hunt for text in Web pages, as opposed to other data, such as with Google Image Search. Google Search provides at least 22 special features beyond the original word-search capability. These include synonyms, weather forecasts, time zones, stock quotes, maps, earthquake data, movie showtimes, airports, home listings, and sports scores. There are special features for dates, including range, prices, temperatures, money/unit conversions, calculation, package tracking, patents, area codes and language translation of displayed pages.
Data about the frequency of use of search terms on Google (available through Google Adwords, Google Trends, and Google Insights for Search) have been shown to correlate with flu outbreaks and unemployment levels and provide the information faster than traditional reporting methods and government surveys.
Google’s rise to success was in large part due to a patented algorithm called PageRank that helps rank web pages that match a given search string. When Google was a Stanford research project, it was nicknamed BackRub because the technology checks backlinks to determine a site’s importance. Previous keyword-based methods of ranking search results, used by many search engines that were once more popular than Google, would rank pages by how often the search terms occurred in the page, or how strongly associated the search terms were within each resulting page.
The PageRank algorithm instead analyzes human-generated links assuming that web pages linked from many important pages are themselves likely to be important.
The algorithm computes a recursive score for pages, based on the weighted sum of the PageRanks of the pages linking to them. PageRank is thought to correlate well with human concepts of importance. In addition to PageRank,
The exact percentage of the total of web pages that Google indexes is not known, as it is very difficult to accurately calculate. Google presents a two-line summary and also a preview of each search result, which includes a link to a cached (stored), usually older version of the page.
Google’s cache link in its search results provides a way of retrieving information from websites that have recently gone down and a way of retrieving data more quickly than by clicking the direct link. This feature is still available, but many users are not aware of this because it has been moved to the previews of the search results presented next to these.
Despite its immense index, there is also a considerable amount of data available in online databases which are accessible by means of queries but not by links. This so-called invisible or deep Web is minimally covered by Google and other search engines.
The deep Web contains library catalogs, official legislative documents of governments, phone books, and other content which is dynamically prepared to respond to a query.
Since Google is the most popular search engine, many webmasters have become eager to influence their website’s Google rankings. An industry of consultants has arisen to help websites increase their rankings on Google and on other search engines.
This field, called search engine optimization, attempts to discern patterns in search engine listings, and then develop a methodology for improving rankings to draw more searchers to their client’s sites. Search engine optimization encompasses both “on page” factors and Off Page Optimization factors (like anchor text and PageRank).
The general idea is to affect Google’s relevance algorithm by incorporating the keywords being targeted in various places “on page”, in particular the title element and the body copy (note: the higher up in the page, presumably the better its keyword prominence and thus the ranking). Too many occurrences of the keyword, however, cause the page to look suspect to Google’s spam checking algorithms. Google has published guidelines for website owners who would like to raise their rankings when using legitimate optimization consultants.
Google search consists of a series of localized websites. The largest of those, the Google.com site, is the top most-visited website in the world. Some of its features include a definition link for most searches including dictionary words, the number of results you got on your search, links to other searches (e.g. for words that Google believes to be misspelled, it provides a link to the search results using its proposed spelling), and many more.
Google’s search engine normally accepts queries as a simple text, and breaks up the user’s text into a sequence of search terms, which will usually be words that are to occur in the results, but one can also use Boolean operators, such as: quotations marks (“) for a phrase, a prefix such as “+” , “-” for qualified term or one of several advanced operators, such as “site:”. The WebPages of “Google Search Basics” describe each of these additional queries and options .
Google applies query expansion to the submitted search query, transforming it into the query that will actually be used to retrieve results. As with page ranking, the exact details of the algorithm Google uses are deliberately obscure, but certainly the following transformations are among those that occur:
Term reordering: in information retrieval this is a standard technique to reduce the work involved in retrieving results. Stemming is used to increase search quality by keeping small syntactic variants of search terms.
There is a limited facility to fix possible misspellings in queries.
Yahoo!! Search is a web search engine, owned by Yahoo!! Inc. and was as of 2009, the 2nd largest search directory on the web.
Yahoo!! Search, originally referred to as Yahoo!! provided Search interface, would send queries to a searchable index of pages supplemented with its directory of sites.
Yahoo! does not use Web Crawling for retrieving the results. It uses Inktomi for getting results of keywords which are not found by the Yahoo!
In 2009, Microsoft and Yahoo! announced a deal in which Bing would power Yahoo!! Search.
Seeking to provide its own search engine results, Yahoo!! acquired their own search technology.
In 2002, they bought Inktomi, a “behind the scenes” or search engine provider, whose results are shown on other companies’ websites and powered Yahoo!! in its earlier days. They purchased Overture Services Inc., which owned the AlltheWeb and AltaVista search engines. Initially, even though Yahoo!! owned multiple search engines, they didn’t use them on the main Yahoo!.com website, but kept using Google’s search engine for its results.
In 2003, Yahoo!! Search became its own web crawler-based search engine, with a reinvented crawler called Yahoo!! Slurp. Yahoo!! Search combined the capabilities of all the search engine companies they had acquired, with its existing research, and put them into a single search engine. The new search engine results were included in all of Yahoo!!’s sites that had a web search function. Yahoo!! also started to sell the search engine results to other companies, to show on their own web sites.
In 2007, Yahoo!! Search was updated with a more modern appearance in line with the redesigned Yahoo!! home page. In addition, Search Assisst was added; which provides real-time query suggestions and related concepts as they are typed.
In 2008, Yahoo!! Search announced the introduction of a new service called “Build Your Own Search Service,” or BOSS. This service opens the doors for developers to use Yahoo!!’s system for indexing information and images and create their own custom search engine.
Below Table gives a description of Web resources on “Physics India” retrieved through the Google Search out of 100 links.
The following figure shows the Graphical representation of Web resources.
The analysis of the data available with the below table shows that most of the Web resources under the search term “Physics India” retrieves the pointer pages (links to websites on the same subject) and achieves 67% among all the other kinds of resources.
Secondly, relates to the journal articles by 26% of retrieved output. And the lowest percentage of search results deals with research news, news clips, databases and conference papers .Above table shows the graphical representation of the retrieved output through Google.
Table of Web resources v/s. frequency of their occurrence per search
Figure: 6 number of Google search result
Figure: 7 Google web resources versus frequency distribution for Google
Below Table shows the ratio of Web resources on “Physics India” retrieved through the Yahoo! Search.
The Figure provides the graphical representation of the frequency of occurrence of various kinds of Web resources.
The analysis and interpretation of the data available in the table reflects that most of the retrieved results provides pointer pages with a percentage of 27% and second comes the web directories as 18% and it provides lowest percentage of retrieval rates of Journal articles.
Table of Web resources v/s. frequency of their occurrence per search
Figure: 8 number of search of Yahoo!
Figure: 9 Yahoo! web resources vs frequency distribution.
Similar to the above classification of domains and the frequency of occurrence of the sources, the Table shows the major domains and the frequency of occurrence of the resources on those things.
Figure: 10 Serial number of search of Google
The data from the above table reflects that most of the resources on the physics are available in commercial domains and secondly on organizational domains of India. And very lowest percentage indicates to the government sites.
Figure provides the graphical representation of the frequency of occurrences
Similar to the above classification of domains and the frequency of occurrence of the sources, the Table 6.4 shows the major domains and the frequency of occurrence of the resources on those things.
Figure: 11 serial number of search of Yahoo!
The data from the above table reflects that most of the resources on the physics are available in commercial domains and secondly on organizational domains of India. And very lowest percentage indicates to the government sites. Figure provides the graphical representation of the frequency of occurrences.
Figure: 12 domain frequency of Yahoo!
While carrying out the study we found that there are two main file formats on which almost all of the resources on Physics are available on the web retrieved through Google and Yahoo!. Thus Table indicates the file formats and the frequency of the resources on that and Figure shows the graphical representation of the frequency distribution.
Figure: 13 file format and frequency distribution of Google
Above data indicates that maximum resources on physics retrieved through the Google are available in PDF (Portable Document Format).
Table indicates the file formats and the frequency of the resources on that and Figure shows the graphical representation of the frequency distribution.
Figure: 14 Yahoo! search serial number
Figure: 15 Yahoo! search engine frequency
Here from the above data it is clear that most of the web resources on Physics India retrieved through Yahoo! search is on HTML format.
g1.jpg
yahoo-logo_2.gif
Rating
Rating: 3.9/5 (232 votes)
Rating: 3.6/5 (200 votes)
Yes (unlimited storage)
Yes (unlimited storage)
Stock price
$677.14 (17th Aug ’12)
$16.03.4 (17th Aug’12)
Search
Yes
Yes
Slogan
“Don`t be evil”
“Do you Yahoo!?”
Website
www.Google.com
www.search.Yahoo!.com
Founded
1998
1995
About
Google is an American public corporation, which is specialize in search engine, and today it is world’s no. 1 search engine.
Yahoo! is an American public corporation and the internet service provider for news, emails, Yahoo! directory, search engine etc
Founder
Google was co-founded by Larry Page and Sergey Brin while they were doing their Ph D at Stanford University
Yahoo!! was founded by Stanford University graduate students Jerry Yang and David Filo in January of 1994
CEO
Larry Page
Marissa Mayer
Industry
Internet, Computer Software
Internet, Computer Software
Search Engine Ranking
No. 1 in US (with market share of 58.5% in Oct’07 as per comScore research)
No. 2 in US (with market share of 23% in Oct’07 as per comScore research)
User generated video
Yes (Google Video and YouTube)
No
Products
Google AdWords, Google Search engine, Youtube video service, Google forum, Gmail, Orkut, Google earth, Google labs etc. Google maps, Picasa, Google books, Google Scholar, Google Docs
Yahoo!! mail, Yahoo!! directory, Yahoo!! answers, Yahoo!! search, Yahoo!! messenger, Yahoo!! 360°, Yahoo!! sports, Yahoo!! finance, Flickr, Yahoo!! Cricket, Yahoo!! News
Site
No
Yes ( Hot Jobs)
Headquarters
Mountain view, California, USA
701 First Avenue, Sunnyvale, California, USA
Table: 1 Google & Yahoo! Comparison
Several similarities of Google and Yahoo! Search engines are found after visiting both websites and testing with a query.
First of all, both search engines give great advantages for users; they provide optional tips or techniques to help users search efficiently. Some similar tips are found.
The first technique is using specific and unique words to describe what we are looking for. If the keywords are more general or ambiguous words, the large number of irrelevant feedback documents will be retrieved.
Another technique is using quotation marks around keywords so that the searchers can find the exact words and narrow the number of search results.
There are several techniques to narrow the search results, including
(1) limiting sites/domain as .com, .edu, or .gov,
(2) specifying types of files to .htm/.html, .pdf, .doc, and .txt,
(3) using operator (-) before the word that we do not need it appears in the search results,
(4) using operator (+) before the word that we do need it in the search results, and (5) using Boolean operators like AND, OR, and NOT to specify searching terms.
The other techniques are using additional options to get more relevant search results. For instance, users can specify updated files, countries, languages, and number of results per page.
These techniques from both search engines are very helpful for users to retrieve more precise results. For example, giving the query like “swine flu” + unt , both search engines returned many web pages about swine flu with the word “unt”, which is the University of North Texas. This can reduce not only the time process the search engines, but also the time to fine the best precise search results.
The second similarity of these search engines is that they provide many categories for the search results, such as web, images, videos, shopping, news, and sports. With selecting a specific category like images, a user can retrieve only images on the webs. Users can also define specific type of file like .pdf, .doc, and .jpg, which helps users to get more precise and reduce time process.
Third, when typing the keywords, Yahoo! suggests the full key words as well as Google. This helps users to select the full query as fast as possible. Suggesting the full keyword is a smart task for both search engines because it is an Artificial Intelligent part, which tries to guess users what the next word of the query is. For example, when typing “swine”, both search engines suggested the word “flu” as the next word.
Next, the search results of both engines are quite similar patterns, which show the title in the first line, the brief description of webpage in the next several lines, and the URL or webpage ‘s address. This snippet for each web result is enough information that a user can quickly scan and move to find information from other web pages in the ranked results.
Last but not least, after submitted a query, both search engines immediately returned the relevant results, along with the large total number of websites retrieved. This shows the ability and efficiency of both search engines.
Obviously, with these advance tips for both search engines provided on their web interface, they become the most top-two popular use over the others in the world. These are the great search engines that people use to find enormous information on the Internet since they provide the ways to retrieve more relevant search results.
The most obvious difference between these two Web sites is about the Interface, Layout, and Design of the pages.
Google offers a very Clean and simplistic interface, whereas Yahoo!!’s is busy and cluttered.
Simple design should always be used to avoid complexity and confusion to the user. When a user is not required excessive text and images, they will feel more at ease and comfortable while using your Web site.
Ultimately, a simplified interface means that the Web site will be accessible to all types of people with different skill levels, thus increasing the potential for a significant user return.
Another key aspect of Design and usability is defining or understanding the purpose of the Website. Due to the simplistic nature of the Google Web site, it is quickly apparent to the user that the
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read more