Thursday, May 10, 2007

The Problem with Lists

I love lists. Any website that promises to show "The Top X Best Y of ALL TIME" is alright in my book. However it is frequently an exercise in disappointment. Here are the three biggest problems I see with "All Time Best" lists.

Data Sets
If you asked me to choose the "Top 3 Best Major-Release Comedies of 1995" I could do a fair job at that (1. Tommy Boy, 2. Billy Madison, 3. Judge Dredd). This list has a very limited data set; comedies released in 1995 that I actually liked.

Most lists on the Internet have massive data sets which make them wildly subjective and frequently inaccurate. Here are a few "number ones" I pulled from some Google searches:

Best Album of All Time: Sgt. Pepper's Lonely Hearts Club Band (link)
Best Song of All Time: Like a Rolling Stone (link)
Best Movie of All Time (Tie): The Godfather/The Shawshank Redemption (link)
Best Athlete of All Time: Michael Johnson (link)
Best Car of All Time: McLaren F1 (link)
Best Tech Product of All Time: Netscape Navigator (link)

Many of these lists add explanations (i.e. excuses) why they chose a particular item for number one. The truth is that they could have easily made Imagine (#3) the number one song instead of Like a Rolling Stone with a slick excuse. Personally I am not a fan of any of the songs on this list until you reach #7, Johnny B. Goode.

And how much better is the #1 than the #2 really?

Margins
The inherent problem with large data sets is that the number 1 item is frequently only marginally better than the number 2 item. In fact, the number 1 item might only be marginally better than the number 50 item if there is a huge data set.

For the sake of arguments let's say Rolling Stone had 10,000 albums to choose from for their Top 500 Albums list. The best album would receive the maximum score of 100% and the worst album would receive the minimum score of 0%. If there was an even margin of goodness between each album (.01%) then Like a Rolling Stone is only .06% better than Johnny B. Goode. That is a pretty insignificant difference. In fact, the difference is statistically irrelevant.

What I would really like to see is a list that takes into account the entire data set, assigns a percentage rank to each item, and then shows the best (100%), a pretty good one (80%), an average one (50%), and then the worst one (0%). Then you would see some real differences. I have always wanted to see a full-tackle football game between the Chargers and a local JV high school team.

Good vs. Important
Is a top-ranked list item actually good or is it just important? Ginette and I both watched Blade Runner for a cinematography class at UCSD. While I can appreciate the influences of this movie, the film noir homage, the brilliance of Philip K. Dick's original work, the swaggering charm of Harrison Ford, and the future career of Ridley Scott, I really don't like this movie. It is dull and I feel like have seen the same thing a dozen times. Perhaps if i watched it when it first came out (I was 2 years old) I would feel differently. I also didn't like The Godfather and A Streetcar Named Desire. Sacrilegious I know.

My point.

On my desk I have a copy of Cryptonomicon by Neal Stephenson. It is a daunting 1152 pages. I bought it based on a Top 100 Sci-Fi Books list I Googled the other day. It was the highest ranked book on the list that was written recently (1999) by an author I had heard of (I read Snow Crash). By my logic, this will be a good book, not simply an important one. However, 1152 pages is a much larger commitment than a 3 minute song so I am worried.

I'll start it tomorrow.

1 comment:

Rick Rockhill said...

well I do see your point on this...I've received so many e-mails and comments about my pesky Saturday Seven lists that it amazes me. Whenever I publish a list, I keep forgetting to include the following disclaimers:
1. depending on how (dis)organized I was at the time of publication, the list may not be all-inclusive
2. The opinions expressed in this blog list do not reflect the opinions of everyone on the planet
3. Depending on the readers' generation, some lists will mean nothing to some and everything to others
4. This list is just a way for me to procrastinate a bit

That might help the naysayers, eh?

Thanks for stopping by Daniel...my blog is in transition..new look coming soon.