Google’s data itch

March 19, 2008 at 11:17 pm Leave a comment

Google is a data-driven company. That’s kind of obvious when you think about it but just how true that is was made clearer in a blog post by Google’s Chief Economist, Hal Varian, Why data matters. The first sentence of the post sets the tone, “Better data makes for better science.”

He provides a history of search to come up with some critical points, “But in order to come up with new ranking techniques and evaluate if users find them useful, we have to store and analyze search logs… If we don’t keep a history, we have no good way to evaluate our progress and make improvements… the data in our search logs will certainly be a critical component of future breakthroughs.”

Just what information does Google track? They’ve got three videos and a whole sub-site that explains things.

It has information people mostly never read like “We may combine personal information collected from you with information from other Google services or third parties to provide a better user experience…” as well as “our servers automatically record the page requests made when users visit our sites. These “server logs” typically include your web request, Internet Protocol address, browser type, browser language, the date and time of your request and one or more cookies that may uniquely identify your browser.”

What they are not as explicit about is that cookies are set to typically expire in 2038 or that they have never erased a single search query (which, when you consider that about 60% of all web searches are Google, is a staggering amount). And, just like other companies, they will hand over the data to governments when lawfully required to do so.

Google is a company who has as one of their ten key philosophies “You can make money without doing evil.” Making money is of course not evil but some underhand tactics like automatic matching, broad matching, content networks, the way the toolbar operates, etc. come pretty close.

As a data-driven company, I think it is likely that several corollaries arise that tends to explain some of the things Google does:

Corollary 1: Data is valuable therefore the more the better. So, Google collects and stores data about everything at everytime about everyone. Searches are just one of the many, many collection points across its vast reach.

Corollary 2: The richer the data the better. Context and results are important to give better insight so the more the linked data it can get, the more it can learn from the data. Hence, the combination of personal information from multiple services.

Corollary 3: Data is a competitive advantage. Not only does Google need it for improving search but as a core corporate asset that drives advertising revenues.

Corollary 4: Collection of data has to be protected at all costs. Hence, Google’s disingenuous arguments about how IP addresses aren’t personal information (PII).

There’s a whole lot more but it looks like data is Google’s itch and the more it scratches, the more privacy advocates feel the pain.


Entry filed under: personal_info, privacy, strategy, video.

Aussie for privacy Webstock recordings now available

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed

This blog is no longer updated. See the About page for more info. I'm currently active on Twitter.

Follow me on twitter


%d bloggers like this: