Hey y'all. Come visit me at dkeithrobinson.com
September 28, 2005 |
10 Comments
Yesterday I stumbled across this story in which John Battelle talks about the size of Google’s index.
Under embargo last week, I spoke to Marissa Mayer about Google search. I do this often, as part of the normal news cycle, but this time was different. After clearing her throat with some 7th birthday news, she dropped the other shoe - Google is now claiming that its index is three times bigger than its competition. “Wow!” I said. “How can you tell?” “Tests we’ve done,” Mayer responded. “But…those are the same tests we’ve been debating since August, right? The same tests Yahoo claims are inconclusive and not to be trusted!”
They don’t want to reveal anymore how many pages it is exactly. Seems like there is some contention there as well. I wonder who gives a damn?
Three times bigger than MSN. Wow, sounds like a lot of links. But…who cares? As a searcher I’m not really interested in how many pages you have. I’m more interested in the quality of results you present at the top. I doubt many people ever get past the first few pages of links and when I’m looking for something I’d really prefer one result—the one that gives me what I’m looking for.
Now, most of what I see is spam and it usually gets worse as you dig deeper. Why would I even want to go through all of that? No, size isn’t a metric I really care much about.
If you can’t give me the relevance I need there, the size of your index is meaningless to me. Maybe Google, Yahoo, MSN and the rest should spend more time working on that, and less on indexing every thing under the sun. Just an idea.
Filed under: Web General
Keyword Tags: google search+engines
Relevance I found though (finding that interesting tidbit) is often related to relationship. So why not search through those you know (like asking a geek friend for advice on computers).
Posted on September 28, 2005 02:26 PM | #
Nollind – That’s a very worth point. I find that most of the good stuff I pull out of the evergrowing “long tail” comes word of mouth.
Posted on September 28, 2005 02:29 PM | #
… which reminds me of a great exchange between Charlie Rose and Google’s Eric Schmidt a couple of months on the Charlie Rose Show –
Eric: Have you ever searched for “Charlie Rose” on our site?
Charlie: Yes.
Eric: Did you get more than one result?
Charlie: Yes.
Eric: That’s a bug.
Posted on September 28, 2005 03:13 PM | #
Of course, the advantage of having a large index diminishes as the index grows.
If you are searching in an index that only contains 10 pages then you are unlikely to find a relevant result. As the index grows it is more likely that you will find what you are looking for. However, as it grows even further then you have to sift through more pages to find the relevant result.
I think all the major search engines are now past the second stage where they need to stop worrying about what they are not indexing and focus on making sure that you see the most relevant result.
Posted on September 28, 2005 03:20 PM | #
Chris, I find the same thing happens with category tags on a site. The more tags you have the less useful they become. I’ve decided to make my local site tags focus on the main topics that I talk about (so that I can see my stream of thoughts on a certain topic) and use Technorati tags for general global tagging (so others can find my stuff). I noticed that Keith is doing this already (i.e. Corpse Bride post).
Posted on September 28, 2005 10:48 PM | #
Suggested search terms are what is required. With the wealth of information that Google has on what people search for you’d think they’d be able to give you a “see also” link which would fire another search with similar terms.
Google-fu is the art of knowing WHAT to Google for, and if you don’t got it, you ain’t gettin’ it!
Note: I’m not talking about the “Did you mean..” stuff that Google does, I’m talking about improving the user assistance. If I search for “Ice Cream” it could suggest flavours, outlets, manufacturers, temperatures etc etc.. as sub-searches. I refuse to believe they can’t do this.
Ohh and Nollind, tags are all well and good - I use the technorati tags on my site for the same “general global tagging” (does it work though?) - but they should (must?) be combined with a properly, expertly built Index if you really want to offer a complete navigation aid based on terms.
The problem with blogs/websites is the sheer amount of information. I’d hate to index my own site but it’s a tried and tested method (amongst others). Tags are good, but only part of a solution.
Posted on September 29, 2005 03:08 AM | #
Without Quantity there can be no Quality
Imagine you’re looking for say a heart specialist because you have some ultra-rare defect. Search engine A only searches doctors in your State, so your pool of amazing heart surgeons is say 1000. Search engine B has the entire U.S. and so your pool of amazing heart surgeons is 100,000.
Which search engine do you want to use? Well it depends on how likely you think it is the best heart surgeon is in your state :)
But I agree with Chris - we are past the stage where it’s likely to matter. I’m just saying that the Size of the index has an impact on the ability to provide relavence.
My biggest problem with search remains:
If I don’t know what it’s called, how can I find it? e.g. I’m in London right now and it took me almost an evening to figure out that the U.K. eqivalent to a coin-op laundry is “Laundrette”.
Posted on September 29, 2005 03:13 AM | #
Gordon, that’s what I’m trying to get at. Most of the time I find general global tags useless from my personal perspective with regards to my local site. My primary interest is looking for a way to group my thoughts, not by general global terms but by more personal meaningful terms. If the focus of one of my posts to me is about culture, then that is the only tag it gets, plain and simple. Yet from a global “Technorati” tag standpoint, that post may have keywords such as “web 2.0”, “community”, “technology”, “six apart”, and so on. These elements are part of the post but they aren’t the focus of the ongoing stream of posts that make up the entire “story” of them, if you want to call it that.
Should you tag your posts with these Technorati tags? It’s up to you. Since my last post above I just noticed a referrer link on my site from someone who was searching for Web 2.0 content. The post that the referrer linked to was not focused on Web 2.0 at all but just had the word within it as just an example. This raises two points. One is that it looks like you don’t need to manually add the tags because Technorati is searching your content anyways for those keywords (as I didn’t use Technorati tags on it). And the other is that this is a perfect example of a global tag that is finding a post “logically” containing Web 2.0 content within it but the overall meaning and importance of the post has NOTHING to do with Web 2.0.
It’s the whole localization vs globalization thing, like how each community or country may place their importance on different things or see specific things in different ways. A cow to us in North America is a cheeseburger. In India though, it is a sacred and holy object. Therefore, if you want to find things on a specific search term, relating that meaning into your search can definitely help (i.e. cow cattle beef vs cow holy sacred). Also, this is why I’d love a search engine that could search through your favorite sites with a variable cluster link depth setting (i.e. six degrees of separation). The reason for this is that these favorite sites that I visit are my favorite ones because they place meaning, importance, and interest, more often than not, on the same things that I do. Thus the links off their site, what they are pointing at, should hopefully have that same meaning and interest.
That’s why I’ve said in the past why I like popping back to Keith’s site because he doesn’t just ramble on about design consistently like a lot of other sites do. He is instead talking about many diverse things which in turn gives a feeling of balance to him and his content. This is why I keep coming back because diversity and balance are two things that are meaningful and important to me. Well ok, he’s a cool and righteous dude as well but I can’t say that though because it will just go to his head. :)
Posted on September 29, 2005 09:58 AM | #
So, building on Yahoo!’s QUANTITY you can now have more control over the QUALITY if you use Roll Your Own.
Fab.
John - the “don’t know what to call it issue” can again be helped by the search engine, a quick consultation of a noun against a variety of localised (localized) dictionaries could return a set of similar words.
Nollind - couldn’t agree more. See Roll Your Own for a “controlled subset” search engine … almost there eh!
Posted on September 30, 2005 01:13 AM | #
Wow, definitely a step in the right direction. Thanks Gordon!
Posted on September 30, 2005 09:52 AM | #
is a writer, designer, etc. in Seattle, Washington.
Home | Search | Archives | Subscribe
Story of an Intranet Redesign - Part One: IA
SOW: Rock & Roll by The Neptunes Featuring Fam-Lay
The highly recommended Dreamhost!