Two months ago, Google released a 63-point patent document outlining how it examines historical data associated with websites and documents in its index. Since that time, we have witnessed changes in the algorithm which have become known as the Bourbon Update. StepForth has written a short whitepaper which studies that document and gives a short overview of the patent and how its contents may affect the Google organic search engine. It is important to note, this document does not cover subsequent announcements from Google such as Google Sitemaps. The following are a few paragraphs from the whitepaper, “Historic Data and Google Ranking: An analysis of the March 30th, 2005 Patent”
Google’s Growth and Reasons for Change
Google is no longer a search engine in the finest sense of the word. As a search tool, Google is a multi-media information retrieval machine that is capable of assumption and suggestion. Growing misuse of simple optimization and linking techniques combined with advances in both user and Internet based technologies are forcing change at Google. StepForth’s most recent whitepaper examines the general search engine rankings generated by Google with a particular emphasis on ideas, concepts and sorting techniques noted in the March 31st 2005 patent application “Information retrieval based on historical data” filed by Google engineers, Anurag Acharya, Matt Cutts, Dean Jeffrey, Paul Haahr, Monika Henzinger, Urs Hoelzle, Steve Lawrence, Karl Pfleger, Olcan Sercinoglu, and Simon Tong.
Google is now seven years old. Each year, Google grows a bit bigger. Big, in the context of Google, is bigger than any other public search resource in the world. Google’s stated mission is “to organize the world’s information and make it universally accessible and useful.” To do that, it not only needs to gather the world’s information, it needs to sort it and return relevant results to its users based on words or phrases entered. Therein lies the key to understanding Google’s greatest challenges, its search features and how they work together.
Google is in the midst of sweeping changes to the way it operates as a search entity. It isn’t a pure search engine. It isn’t really a portal either. Google has become more of an institution, the ultimate private-public partnership. Referring to itself a media-company, Google is now a multi-faceted information, advertising and active media delivery system that is accessed primarily through its simple, well-known interface found at www.google.com.
In relation to other large Internet businesses, Google has matured at its own speed, often acting as a defining voice of the search-sphere. Google has been the favoured search tool of the current generation of Internet users which has helped sustain and motivate their ability to innovate, and those innovations have pushed other search firms to bolster their search products. Innovation on basic concepts in search has been necessitated by two major factors.
The first is the basic concept of Moore’s Law. Advances in technology offer both home and business users more powerful tools to create basic and advanced documents with. The web has moved away from basic HTML based sites to include sites that feature video, audio, shopping carts, massive databases and dynamically generated content. The description of a website as a collection of unique web pages housed under the same URL or domain is somewhat limited in that these “pages” may actually contain video or audio content, or might not even actually exist until complied “on the fly” using information supplied by the site-visitor.
The second major factor is that Google needs to discourage search-marketing consultants (SEO/SEM) from abusing the obvious exploits found in their core-method of sorting and ranking sites, PageRank. While parts of the PageRank formula have changed over the years, the base concept that a link equals a vote has remained the backbone of Google’s ranking algorithm since day one. The simple logic behind PageRank produced highly relevant search engine listings, which were fairly easy to manipulate. In order to prevent gross commercial manipulation, Google has had to add several weights and measures to the evaluation of incoming links, a process that is obviously easier said than done. It also has to tie as many of its features and services together in order to present the best search listings it possibly can.
As Google grew over the years, subtle shifts in its algorithm were seen and recorded by the search marketing industry. There have been a few times, most notably in mid-November 2003, when Google made massive changes to its algorithms, thus causing massive changes to the search engine results pages generated when users entered a simple query. These changes precipitated a cat-and-mouse relationship between Google and many search marketers. Every change Google makes is deconstructed and debated across over a dozen search related web-forums. Google is known for its from-the-hip style of innovation. While the face is familiar, the brains behind it are growing and changing rapidly. Four major factors (technology, revenue, user demand and competition) influence and drive these changes. Where Microsoft dithers and .dll’s over its software for years before introduction, Google encourages its staff to spend up to 20% of their time tripping their way up the stairs of invention. Sometimes they produce ideas that didn’t work out as they expected, as was the case with Orkut, and sometimes they produce spectacular results as with Google News. The sum total of what works and what doesn’t work has served to inform Google what its users want in a search engine. After all, where the users go, the advertising dollars must follow. Such is the way of the Internet.
Users continue to flock to Google and Google continues to build itself into an indispensable information resource. Users trust Google as a resource and their underlying faith is the foundation upon which its highly profitable paid-advertising delivery platform is based upon. Google, in turn, needs to provide its users with increasing levels of filtering to prevent its search results from being manipulated and subsequently degraded.
This is what Google has attempted to do in its recent 63-point patent filing. Careful reading leaves an impression that the ideas behind the patent are designed to build a firewall against link spam and other forms of obvious ranking manipulation. It also shows a few new levels of sophistication in Google’s recording and ranking algorithms. As the web grows larger and more complicated, Google engineers face problems of judging a wide variety of documents against relevant keyword queries.
The whitepaper details changes to the way Google examines relevancy between linked documents as outlined in a patent filed on December 31, 2003 and published on March 31, 2005. Over the years, Google has modified and improved its ranking algorithm in an attempt to prevent manipulation of its listings; however, every change or innovation has been treated as a challenge by the search-marketing sector. The patent document this whitepaper is based on presents several fundamental changes in the way Google examines and evaluates information contained in documents found in its index and, in documents that are linked together. The most important piece of information stemming from the patent is that Google compiles document profiles on every item in its vast index. These document profiles contain information on the history of a document and the URL or domain it originates from. This historic-profile is used to judge the relevancy and/or validity of information contained in the document being profiled as well as information about documents linking to the document being profiled.
The contents of a document profile can be placed under a few simple headings based on the area or element of a document being examined. These areas or elements are: On-site Elements, On-site Links, Incoming Links, and Elements found on pages or documents linking to the document being evaluated.
The patent document itself is very long and covers 63-points, most of which cross-reference with other points found in the patent. The whitepaper is an attempt to tie these points together into a coherent examination of changes to the algorithm and what webmasters, SEOs and business owners should watch out for.