Re: Using google to weed out too-obscure answers

From: lone1c &lt; no_reply_at_yahoogroups.com &gt;
Date: Wed, 24 Apr 2002 18:13:08 -0000
Message-ID: &lt;aa6sjk+qf1r_at_...&gt;

lone1c <*no_reply_at_yahoogroups.com*> · Wed, 24 Apr 2002 18:13:08 -0000

--- In quizbowl_at_y..., "davidlevinson" <levin031_at_t...> wrote:
> If we assign "B" to be the baseline measure, and then look at 
> ratios, we could develop something a little bit more stable
> e.g.
> 
> LOG (GoogleCount(B)/GoogleCount(X))
> 
> where X is the word in question
> and B is the baseline word or wordphrase (and preferably B >> 
> any X we are likely to test).
> 
> B should be large (it need not be "the"), but should be 
> something common and unlikely to change relative position (e.g. 
> "George Washington" )  Hits = 1,040,000
> or
> "William Shakespeare" Hits = 406,000
> but not
> "quiz bowl" 22,800 (not counting "quizbowl")
> 
> 
> I am open to what the Baseline word should be

How about that greatest of answers for when you don't have a good
guess--"Smith" [~1.9 x 10^7 hits]?

--AEI