Re: Using google to weed out too-obscure answers

--- In quizbowl_at_y..., "davidlevinson" <levin031_at_t...> wrote:
> If we assign "B" to be the baseline measure, and then look at 
> ratios, we could develop something a little bit more stable
> e.g.
> 
> LOG (GoogleCount(B)/GoogleCount(X))
> 
> where X is the word in question
> and B is the baseline word or wordphrase (and preferably B >> 
> any X we are likely to test).
> 
> B should be large (it need not be "the"), but should be 
> something common and unlikely to change relative position (e.g. 
> "George Washington" )  Hits = 1,040,000
> or
> "William Shakespeare" Hits = 406,000
> but not
> "quiz bowl" 22,800 (not counting "quizbowl")
> 
> 
> I am open to what the Baseline word should be


How about that greatest of answers for when you don't have a good
guess--"Smith" [~1.9 x 10^7 hits]?

--AEI

This archive was generated by hypermail 2.4.0: Sat 12 Feb 2022 12:30:46 AM EST EST