How does Google Toolbar work? (Written
in year 2002)
Nowadays, there are many search engines that you can use to
search the web. Just to name a few: Yahoo, AltaVista, Excite,....
etc.
However, my favorite one (I believe it's also the favorite search
engine of a lot of people) is Google. Not only does it give
you fast and accurate search results, but it also offers a very
handy search toolbar that you can integrate with IE5 or above
to make Google part of your browser.
When you install Google Toolbar, you may opt to use the "advanced
features", which Google indicates will have some privacy implications.
In this article, we are going to discuss what those privacy
implications are and how Google Toolbar works.
Searching the web with Google Toolbar
Alright, let's start with searching the web using Google Toolbar.
Say, we type in "white house" in the toolbar and hit the "Search
the Web" button...
What happens behind the scene is that the toolbar bar will make
an HTTP request to www.google.com with "white house"
being the searching keywords. This process is no different from
when you do a search for "white house" on Google's homepage.
How did I get to know this? HttpRevealer told me the answer:
If we focus on the first line of the HTTP request, we see this:
| GET /search?sourceid=navclient&querytime=4KuE&q=white+house
HTTP/1.0 |
Obviously, the q parameter is the query because its value
is "white+house" (the plus sign represents a space after being
URL-encoded). By now, you may have noticed the sourceid
parameter. It carries the value "navclient". Apparently, it
tells the Google search engine that this search request came
from the handy Google Toolbar (i.e. the navclient).
In response to the HTTP request, Google's search engine simply
returns the search results in HTML to the browser as usual.
The browser displays the results as if the search was done from
Google's home page.
"Privacy Information" being Sent Quietly
The above was surprisingly straight-forward and easy, huh. Now
let's turn our attention to another behavior of the toolbar.
We are going to see how it "betrays" you as you are happily
surfing the web.
Say, if you visit the White House's website (www.whitehouse.gov),
your browser will naturally make a number of HTTP requests to
www.whitehouse.gov to retrieve the home page as well
as all the needed images. That's not surprising at all.
However, something else is going on without your notice (if
you have the Advanced features turned on). That is, the Toolbar
will quietly inform the Google server of the URL you are visiting
and the server will in return pass back some information about
the page such as its ranking and category.
How did I get to know that? Hahaaa, see this:
The above HTTP request/response takes place as soon as you load
the white house homepage. Like I said, the HTTP request was
initiated by Google Toolbar installed on your PC. Okay, let's
take a closer look at the first line of the request.
GET /search?client=navclient-auto&ch=5248559537
q=info:http%3A%2F%2Fwww%2Ewhitehouse%2Egov%2F HTTP/1.0
|
This "GET" header was originally one single long line. I split
it into 2 lines for better readability. It looks a bit complicated,
doesn't it. Don't worry. We will just discuss the important
bits.
This time, the sourceid parameter is absent. Instead,
there is a client parameter whose value is "navclient-auto".
From this alone, you can guess it's telling the www.google.com
server that the HTTP request was made by the Toolbar (i.e. navclient)
of its own accord.
The q parameter has the value "info:http%3A%2F%2Fwww%2Ewhitehouse%2Egov%2F"
(which is the URL-encoded representation of "info:http://www.whitehouse.gov/").
It tells the server that you are visiting www.whitehouse.gov
and more importantly asks it for more information about the
page. We will see what information will be returned by the server
shortly.
The above is basically the so-called "privacy information" that
is sent back to Google server for analysis. At some other times,
more information will be sent back in addition to the one we
just discussed. But you now get a rough idea as to what type
of information is sent back. So, when I previously said "the
Toolbar betrays you as you are surfing the web", I was
just joking since the information sent back by Google
Toolbar is not really that sensitive. Plus, if you want, you
can turn off the Advanced feature to prevent your information
from being sent back. So, please don't take me up on this :)
Okay, we've seen what information gets sent from the Toolbar
to the server, it's time for us to see what's returned by the
server.
The server basically returns an XML document that contains information
about the page's ranking (i.e. PageRank) and categorization
among other things. Let's take a look at the XML document:
Click the above
image to view the entire XML page
Note: The XML's DTD can be obtained at www.google.com/google.dtd.
When the Toolbar receives this XML document, it will parse it
and display relevant information graphically and textually.
For instance, the PageRank icon gives you a visual cue as to
how high the page is ranked:
Another peice of information is the category:
Where did the PageRank and category information come from? Hahaa,
if you look at the XML document carefully, you will find the
following tags between the lines :)
| ........
........
<RK>9</RK>
........
........
<CAT>
<GN>
gwd/Top/Regional/North_America/United_States/.../White_House
</GN>
<FVN>
Top/Regional/North_America/United_States/.../White_House
</FVN>
</CAT>
........
........
|
That's that. I hope you enjoyed the discussion. I found
out the above with HttpRevealer.
You can explore the web yourself too! [See
more info]
Steven Chau
Go back to the Index of Articles
"Google" and "Google Toolbar" are registered trademarks of
Google Inc.
All other company and product names may be trademarks of the
respective companies with which they are associated.
|