Reducing Your Book Buying To Statistically Ridiculous Triviality

from the how-many-words-per-dollar-is-that? dept

Earlier this year, Amazon.com got some press for revealing their "text stats" with "statistically improbably phrases," listing out phrases that tend to only appear in that particular book. There were other stats as well -- and all were about equally as useless. It appears that the Washington Post has just discovered these silly stats and has written up an amusing article noting some of the completely useless and trivial stats you can now compare different books over. They really seem to like the "words per dollar" feature, for instance. "But in its pure form, Text Stats is a triumph of trivialization.... Now you too can sound like a literary insider at Washington cocktail parties. You can throw around statistics and make clever conversation about the hard history books, the long-winded novels, even those thick, heavy, make-you-think philosophy tomes that contain really, really long words. And the beauty of it is, with Amazon's "Search Inside" Text Stats and other features, you won't even have to read them."
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. identicon
    dorpus, 30 Aug 2005 @ 2:10pm

    But can it do Type I Nested F-tests?

    You can have covariates that appear insignificant on their own, but appear significant in the presence of another covariate. Or the opposite can occur. We have condition indexes, variance inflation factors, and type I F-tests to evaluate these phenomena.

    It might be fun to perform a principal components analysis (PCA) on a book, to find eigenvectors of words that describe a typical page. What if every page in a book is merely a linear combination of eigenvectors? It would probably work really well for Techdirt, with its predictable anti-recording-industry postings, free market dogma, and anti-dorpus rants.


    link to this | view in thread ]

  2. identicon
    stochastix, 31 Aug 2005 @ 3:00am

    SIPs and search

    SIPs are useful for certain types of searches (especially technical stuff). A SIP is very rare in the universe of books... so if you can find a small set of books where the phrase occurs several times then there is a strong chance that these books are relevant to the topic of the SIP.

    link to this | view in thread ]

  3. identicon
    Anonymous Coward, 31 Aug 2005 @ 9:10am

    how ironic

    What phrase is so improbable as "statistically improbably"?

    link to this | view in thread ]

  4. identicon
    malhombre, 31 Aug 2005 @ 9:25am

    Re: how ironic

    Or "anti-dorpus rant"

    link to this | view in thread ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.