Re: Using google to weed out too-obscure answers

davidlevinson <*levin031_at_davidlevinson.yahoo.invalid*> · Wed, 24 Apr 2002 17:34:04 -0000

At any rate  the Mohs scale of question hardness was actually 
developed in the 1980s, (See Buzzer Issue 5 http://
guir.berkeley.edu/quiz-bowl/resources/newsletters/Buzzer5.html) 
though there was only a Delphi /Expert system process to 
classify questions, along with some question frequency analysis 
done by Dr. Robert Meredith (much like Eric Hilleman does 
today).  

The use of google or a google-like substance along with a log 
formula is an improvement that gets us out of the self-referential 
quizbowl mode.  However, the particular formula can go negative 
for super-common words, and will change over time as the 
Google index becomes larger (without the fact really becoming 
easier or harder).  A good measurement should remain 
constant over time (like the meter).  The previous measure 
would not be approved by the SI people.

I think a good metric here is required, and with some work (like 
controlling for the size of the google database), is possible.
I don't know the size of google's database, but the word "the" 
shows up "2,560,000,000" times, so that is a lower bound.

If we assign "B" to be the baseline measure, and then look at 
ratios, we could develop something a little bit more stable
e.g.

LOG (GoogleCount(B)/GoogleCount(X))

where X is the word in question
and B is the baseline word or wordphrase (and preferably B >> 
any X we are likely to test).

B should be large (it need not be "the"), but should be 
something common and unlikely to change relative position (e.g. 
"George Washington" )  Hits = 1,040,000
or
"William Shakespeare" Hits = 406,000
but not
"quiz bowl" 22,800 (not counting "quizbowl")

I am open to what the Baseline word should be

-- David M. Levinson

Wondering Why wouldn't it be the Matthews-Levinson (or 
Levinson-Matthews) Scale ?

"David+Levinson" 3130 Google Hits 
"David+M.+Levinson" 64 Google Hits 

Hoping that I become Lexiconical if not Canonical.

--- In quizbowl_at_y..., castrioti <no_reply_at_y...> wrote:
> 
> Fluidmosaic6 wrote:
> > The Castrioti Scale of Question Hardness sounds 
authoritative, 
> > scientific and European.  It's also a Chevy dealership around 
here, 
> > if you remove "Scale of", "Question", and "Hardness".
> 
> (Really?) For fun, anyway,
> 
> I guess my inclination is to call it the Levinson-Castrioti Scale 
of 
> Question Difficulty (trying to give credit where credit is due-- 
> DavidLevinson inspired the idea by trying to figure out whether 
Tonio 
> Kroger or Felix Krull was more difficult using google), yet the 
> equation 10-logN = difficulty (where N = hits on google), as far 
as I 
> know, is original. Plus, I think it somehow sounds more 
academic with 
> a hyphen  --This, of course, only if DavidLevinson has no 
objection...
> 
> --Wesley (not really trying to resurrect an old thread)