As a young child my grandfather would take me to the dump with him and let me wander through the piles of rubbish while he unloaded our trash. Of course we never brought anything but worthless garbage to the dump but I always found a few useful items in other people's waste to take home with us.
Sometimes surfing the internet feels like going to the dump with my grandfather again. I climb over piles and piles of garbage looking for the good stuff.
The promise and curse of the internet is an abundance of information. So much information is available it's not uncommon to hear people referring to this wealth as "information overload." Most often we're not overwhelmed by information, we are overwhelmed by sifting through all the junk trying to find what we want.
One of the early attempts to structure the vast online information landscape was Slashdot (1997), a social bookmarking site that allows users to comment on and vote up popular stories. Following Slashdot, sites like StumbleUpon (2001), del.icio.us (2003), Digg (2004) and Reddit (2005) followed similar methodologies.
These sites allow users to see what other people think is good, noteworthy or worthwhile. What they don’t do is personalize their recommendations to each user. Everyone who visits the site sees exactly the same recommendations.
Today, hundreds of cutting edge applications are taking advantage of recent advances in artificial intelligence and machine learning--as well as a boom in online social networks--to create even more effective ways of curating our information.
Early Approaches
Before we ever had an internet people were telling us what we would like and what we should do. Reviews of plays, books and movies were popular ways of figuring out what was worthwhile. As America became more mobile travel guides like Michelin provided advice on where to stay and what to eat while on vacation.
These reviews provided a professional opinion but of the least personalized type. The advice was the same no matter who was reading it and was based on the experiences of a cadre of elite experts. Popular opinion was used when available but few venues provided measurable data. Movie theater receipts and record sales were two notable exceptions.
The explosion of connectedness facilitated by the internet however, allows for collaboration and democratization as never before. Rather than recommendations from a handful of elites, we can crowdsource recommendations from the billions of people with access to the internet.
We can get restaurant recommendations that aggregate the combined wisdom and experiences of thousands of people with sites like UrbanSpoon, Zagat and the new Dinevore. These sites use user-generated information to make recommendations (though Zagat still uses professional critics too).
Like box office blockbusters and “Top 40” hits, review sites which rely on user interaction tell us what's popular but not necessarily what we are personally going to like. The whole appeal of recommendations is to find more of the good stuff and avoid the garbage. But what if the garbage is popular? In other words, what if we are in the minority? How do we find the things we like?
Predicting Future Happiness
Knowing what we're going to like is a tough nut to crack. Harvard psychologist Dan Gilbert elaborates on how difficult it is in his excellent book, Stumbling on Happiness. I highly recommend this book to anyone interested in understanding happiness.
Dan Gilbert claims everyone is bad at knowing what will make us happy in the future, yet we constantly engage in this predictive activity. Not merely for ourselves but for others as well. Whether we're entrepreneurs attempting to make a product people will like or a romantic partner trying to endear ourselves to our lover, we are making predictions about what will please someone else.
Ultimately all of these recommendation systems are trying to do the same thing, predict future pleasure. When prediction systems work well we're thrilled with the recommendation. We go to a restaurant we've never been to before and we love it, we see a movie we knew nothing about and it blows us away.
The problem with using aggregated data is that it frequently misses the point. Sometimes we go to a restaurant for all the wrong reasons; to make someone else happy or send social signals. How many people have eaten at Sardi's for reasons other than the quality of the food or had dinner at Masa as a way of displaying their wealth and status?
What we really need is a structure that takes human bias and subjectivity out of the system, a way to reduce the noise automatically, even mechanically, and see patterns in the way people behave. The computational power and consolidated data available on the internet provides a rich soil for such fruit.
More Complicated Tools
Some recommendations are mechanical in nature. Amazon recommends Miss Peregrine's Home for Peculiar Children because you bought Night Circus and other people who bought Night Circus also bought Miss Peregrine's Home for Peculiar Children.
As opposed to the popularity contest of the "I liked it so you'll like it too" recommendation system, purely mechanical systems follow the "this goes with that" principle. No matter how unpopular your particular niche interest is, people with the same taste reveal implied recommendations through their purchases or viewing habits.
Last.fm is another site that looks at revealed preferences. It recommends music by asking you to pick an artist or song you already like. It matches the music you said you liked with music other people who enjoy that music also like. People who like James Taylor for example, also like Jim Croce.
This type of algorithm-based recommendation was the first to really make customized recommendations. You may not like Jim Croce, but it only recommended him because you told the algorithm something about yourself--that you liked James Taylor.
Everyone who goes to Amazon or Last.fm get recommendations specific to their tastes as expressed by some input. But everyone who buys that book or selects that song will get the exact same recommendation. In this way it is somewhat customized but not quite personalized.
Bundle is a recommendation site that let's you get a little more specific. It uses aggregated credit card data along with demographic information about the card holder to build mechanical recommendations for products and services. Users can filter their recommendations by specifying the demographic category they fit into and then seeing what similar people bought.
The Ultimate in Personalization
In order for recommendations to accurately predict what you will like, sites need to know a lot more about you. Dan Gilbert concludes his book with the advice that we will be happiest if we follow the recommendations of people who are similar to us. Most of us surround ourselves with people we share interests with and call them friends.
Not surprisingly, as social networking sites became popular people began looking to them for recommendations. These sites provided the crowdsourcing benefits of aggregated input with the customization of being pulled from people similar to ourselves. Surely the websites, movies and music being recommended on social platforms like Facebook, Twitter and Google+ would be the most accurate.
Aside from knowing how far your high school buddy ran this morning or what he had for breakfast, one of the benefits of social networks is the ability to reduce the internet to a manageable size. Friends acting as curators of the internet naturally recommend things they find interesting. And because our friends are likely to have the same tastes we do, we'll probably find them interesting too.
This combines the curation and up voting aspect of social bookmarking sites with the personalization of our specific network of friends allowing (theoretically) the recommendations we received to be uberspecific to our tastes.
Did I hear a chuckle? Anyone who's been on social media knows it hasn't worked out that way. The logic was sound but didn't take into consideration how our behavior changes online. When developing friendships was constrained to people we personally met and it grew organically from shared interests our friends were a good reflection of our preferences and interests.
But in the online social world things are different. We "friend" people we knew when we were younger but haven't been in touch with in decades, we connect with people at work for political reasons that have nothing to do with shared interests, we follow people because they follow us and following back seems like the right thing to do.
In other words, online friendships mean little when it comes to analyzing our preferences and social media curation is pretty much a joke.
Even under the best of circumstances I wouldn't recommend the same thing to all my friends just because I liked it. If I know my friends well enough to know their preferences I would only recommend things I thought they would like. And those recommendations would be different for different people.
What are the Best Algorithms Now?
The most personalized recommendations will be those which are able to take into account a whole host of factors. Not just what book you bought today, but all the books you've bought in the past and how much you liked them. It will take into consideration which books people who have preferences similar to yours have liked (whether you are friends with them or not).
That's what GoodReads is doing. You tell the site which books you've read and how much you liked them and it matches your library with libraries similar to yours so you can find books you haven't read that those people liked. It creates a specific profile of you so recommendations are different for everyone based on their specific reading history.
One of the most advanced movie recommendation engines is run by Netflix which uses factors such as how much your preferences have changed over time and how likely you are to overrate a good movie after seeing several good movies in a row. The downside is, because these systems are built entirely on your preferences they need you to rate a lot of books and movies before they can accurately predict what you will like.
Pandora Radio on the other hand, creates a profile of your preferences and then "learns" what type of music you'll like by seeing which songs you give a "thumbs up" or a "thumbs down" to. Pandora is built on an index of song characteristics called the "Music Genome Project" which has been so successful many new sites have attempted to map other "genomes."
Jinni recommends movies based on its Movie Genome and StartupGenome attempts to gauge the viability of a startup company based on certain characteristics of its genome. But in many ways these attempts still rely on what people say they like. The real advancements in recommendations will be when machines will be able to learn what our preferences are without us having to tell them. The highest rate of error in prediction technology is the difference between what people say they like and what they actually do.
Future Advancements
We will get this right. It's only a matter of time before artificial intelligence advances sufficiently to predict our future happiness at least as well as we can. In fact, AI will probably be better at it since it will be based on the actual outcomes of choices rather than on a biased perception. But they will only be as good as the data they have to work with.
It will require a lot more information than is currently available including how our personal thought processes works and when we're likely to be lying to ourselves. For example, I may say I'm adventurous when it comes to eating but every time I go to a Filipino restaurant I get something safe. I can continue to rationalize why I got the crispy pata yet again and probably convince myself I'll try the balut another time, but the data of my historical choices reveals my true preferences. (I've never tried balut.)
This collection of preference data is already taking place online. Every time we look something up on Google it stores the search. Recommendation engines (if given access) could build a picture of the things we're interested in by using this data. That's why police detectives always seize suspects' computers and analyze their browsing history. Were the suspects searching for "ways to poison someone with household products" or "how to rob a bank"?
I'm not saying this to be alarmist or to suggest the internet knows too much about us. Quite the contrary, I think it needs to know much more. The more our thought processes get recorded online the better the recommendations will be.
If you're interested in finding out more about AI and Machine Learning, Stanford is offering online classes this fall for free. Go to http://www.ai-class.com to register. But hurry, classes start October 10.

