If I like this, why will I love that?
November 25th, 2008 by PhoebeNever trust a thin cook, an angry clown, or a kindergarten teacher who dislikes glitter glue. This is wisdom I’ve picked up over several decades of life. And it might help explain the challenge of the teams competing to win $1 million for a 10% improvement in Netflix’s Cinematch recommendation algorithm.
The teams work with two very large sets of user ratings and the challenge is to predict each user’s second set of ratings based on the first set. As Clive Thompson describes in a fascinating analysis of the competition in the most recent New York Times Magazine, If You Liked This, You’re Sure to Love That: “Most teams suspect that continuing to tweak existing algorithms won’t be enough to get to 10 percent. They need another breakthrough.”
We’re inclined to think that what’s missing is the content. Understanding it – and the transparency this understanding adds to the process. The article describes how Napoleon Dynamite and a handful of other polarizing, often indie films cause a high percentage of the recommendation algorithm errors. At Jinni, we understand that a film is controversial by analyzing reviews. Take Schindler’s List. Many people consider it moving and many others call it manipulative. Analysis of reviews can extract and present this information, rather than averaging numbers.
Singular value decomposition, the hard task of uncovering “factors” people like or dislike, leads the Netflix teams to some unexpected correlations among films. These “factors” are the genes in our Movie Genome, which we keep explicit and can reason about. We can tell if two films are really similar and tune for different people’s preferences – for example, those who consider the soundtrack key and those who find it nearly irrelevant. And we can explain why we made a recommendation in plain English.
Reed Hastings, CEO of Netflix, “is even considering hiring cinephiles to watch all 100,000 movies in the Netflix library and write up, by hand, pages of adjectives describing each movie.” It sounds promising but purely manual methods are slow and expensive. They’re inaccurate - people tend to overlook tentative attributes - inconsistent - among different experts and even the same person at different times - and hard to re-do when the taxonomy changes. We overcame these disadvantages with our automated Natural Language Processing solution. To ensure quality, the automated tagger forwards questionable decisions to live experts and learns from the corrections.
The serendipity of an unexpected film you fall in love with is a tricky act. But like a cook who eats cake, we think a service based on analysis by people who actually watched the films can hit that sweet spot more often.
Update: An interesting new analysis of the Netflix competition on ReadWriteWeb, including the possible need for extra data and the human touch.
Popularity: 6% [?]



December 6th, 2008 at 2:37 pm
i really think your recommendation system has the most promise. the more keywords /metadata you can assign to a film, the better matches can be made.
for example; twilight and underworld are both vampire films yet they are entirely different AND have very different target audiences.
also Are you tracking users preference for actors/actresses? because that could have contributing factor. the algorithm could say I should like a film but if the lead actor is someone i can’t stand (like adam sandler) then i’m going to rate the film poorly.
Add a reply
June 3rd, 2009 at 12:38 am
[...] it can consequently create much better results and even solve the problems Netflix has with predicting indie if people will like certain indie movies like Napoleon [...]
Add a reply