Collaborative filtering and content-based movie recommendation system
This is a movie recommendation system that provides recommendations based on a subset of the MovieLens dataset.
I have created recommenders able to provide recommendations for specific users, as well as general recommendations based on popularity and movie content.
A collaborative filtering algorithm was trained that is able to provide predictions for specific users in our dataset. Check out the full code on my Github:
https://github.com/raveenak96/Recommender-System
The dataset used was a subset of a dataset found on Kaggle that contains metadata and rating information for the MovieLens dataset, which is a dataset developed by the University of Minnesota containing movie ratings by individual users. I have used a smaller subset of this data consisting of 100,000 ratings from 700 different users on 9000 movies because of limited computing power. The dataset also contains various movie metadata like posters, backdrops, budgets, revenue, release dates, languages, genres, taglines, and cast and crew information. I have also utilized imdb ratings included in the dataset to create a popularity-based recommendation system.
The first recommendation system I created provides recommendations based simply on the most popular movies in the dataset. It also has the ability to provide recommendations by genre. I created the list of recommendations by building a top chart of the most popular movies according to IMDB ratings. The popularity of a movie was calculated using IMDB's weighted rating formula:
R = the average rating for the movie
v = the number of votes for the movie
m = the minimum number of votes required to be listed in our top chart (I used the 90th percentile)
C = the mean vote across the whole report
Using this formula for movie weighted ratings, I created a class "SimpleRecommender" with a function to be able to create a top chart of movies of a user-specified length, sorted by movie weighted rating. There is also an option to limit recommendations to a specific genre.
For example, I will generate 5 general recommendations using this popularity-based system:
Title | Overview | Genres |
---|---|---|
The Shawshank Redemption | Framed in the 1940s for the double murder of his wife and her lover, upstanding banker Andy Dufresne begins a new life at the Shawshank prison, where he puts... | Drama, Crime |
Dilwale Dulhania Le Jayenge | Raj is a rich, carefree, happy-go-lucky second generation NRI. Simran is the daughter of Chaudhary Baldev Singh, who in spite of being an NRI is very strict about adherence to Indian values... | Comedy, Drama, Romance |
The Godfather | Spanning the years 1945 to 1955, a chronicle of the fictional Italian-American Corleone crime family. When organized crime family patriarch, Vito Corleone barely survives an attempt on his life, his youngest... | Drama, Crime |
The Dark Knight | Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent, Batman sets out to dismantle the remaining criminal organizations that plague the streets... | Drama, Action, Crime, Thriller |
Fight Club | A ticking-time-bomb insomniac and a slippery soap salesman channel primal male aggression into a shocking new form of therapy. Their concept catches on, with underground... | Drama |
These are the 5 most popular movies in the dataset. We can also generate recommendations by a specific genre. Let's get 5 recommendations for comedy movies:
Title | Overview | Genres |
---|---|---|
Dilwale Dulhania Le Jayenge | Raj is a rich, carefree, happy-go-lucky second generation NRI. Simran is the daughter of Chaudhary Baldev Singh, who in spite of being an NRI is very strict about adherence to Indian values... | Comedy, Drama, Romance |
Forrest Gump | A man with a low IQ has accomplished great things in his life and been present during significant historic events - in each case, far exceeding what... | Comedy, Drama, Romance |
La La Land | Mia, an aspiring actress, serves lattes to movie stars in between auditions and Sebastian, a jazz musician, scrapes by playing cocktail party gigs in dingy bars, but as success mounts... | Comedy, Drama, Music, Romance |
Amelie | At a tiny Parisian cafe, the adorable yet painfully shy Amelie (Audrey Tautou) accidentally discovers a gift for helping others. Soon Amelie is spending her days as a matchmaker... | Comedy, Romance |
Captain Fantastic | A father living in the forests of the Pacific Northwest with his six young kids tries to assimilate back into society. | Adventure, Comedy, Drama, Romance |
Although this is a good way to find the most popular movies in the dataset, it is providing recommendations based simply on IMDB popularity without considering movie content or individual user preferences. Below I present three other systems that improve on this baseline.
I next created a recommendation system that uses information using movie metadata including movie genres, keywords and cast and crew information. It provides movie recommendations using the ContentBasedRecommenders class, and generates recommendations given a movie title from our dataset. It can also be limited to a specific genre. Recommendations are produced by finding similar movies to the given movie, using a measure of cosine similarity between the movie metadata. The CountVectorizer class was applied to the processed movie metadata, and was used to calculate the cosine similarity between movies. I also made the recommendations to be sorted by our previous weighted rating calculation, so the most popular movies are always recommended first.
Let's look at the top 10 recommendations for movies similar to The Dark Knight. I have provided the metadata from the dataset about the movie below:
Tagline: Why So Serious?
Movie Overview: Batman raises the stakes in his war on crime. With the help of Lt. Jim Gordon and District Attorney Harvey Dent,
Batman sets out to dismantle the remaining criminal organizations that plague the streets. The partnership proves to be effective, but they soon find
themselves prey to a reign of chaos unleashed by a rising criminal mastermind known to the terrified citizens of Gotham as the Joker.
Genres: Drama, Action, Crime, Thriller
Keywords: dc comics, crime fighter, secret identity, scarecrow, sadism, chaos, gotham city, etc.
Director: Christopher Nolan
Cast: Christian Bale, Heath Ledger, Michael Caine
Below are the top 5 recommendations given for the movies most similar to The Dark Knight, based on their metadata (genres, keywords, director, and cast) :
Title | Overview | Genres |
---|---|---|
M | In this classic German thriller, Hans Beckert, a serial killer who preys on children, becomes the focus of a massive Berlin police manhunt... | Drama, Action, Thriller, Crime |
The Dark Knight Rises | Following the death of District Attorney Harvey Dent, Batman assumes responsibility for Dent's crimes... | Drama, Action, Thriller, Crime |
Batman Begins | Driven by tragedy, billionaire Bruce Wayne dedicates his life to uncovering and defeating the corruption that plagues his home... | Drama, Action, Crime |
The Killing | The Killing was Stanley Kubrick's first film with a professional cast and the first time he achieved public recognition as the unconventional director he's now known for. The story is of ex-prisoners who plan to set up a racetrack so they can live a life... | Drama, Action, Thriller, Crime |
The Raid | Deep in the heart of Jakarta's slums lies an impenetrable safe house for the world's most dangerous killers and gangsters. Until now, the run-down apartment block... | Action, Thriller, Crime |
We can see that producing recommendations based on movie metadata gives us very accurate recommendations very similar to The Dark Knight. Other Batman and crime movies are recommended.
Let's look at the top 5 recommendations for comedy movies similar to The Dark Knight, a genre not applicable to the movie:
Title | Overview | Genres |
---|---|---|
The Italian Job | Charlie's got a 'job' to do. Having just left prison he finds one his of friends has attempted a high risk job in Torino, Italy, right under... | Action, Crime, Comedy, Thriller |
Chicago | Murderesses Velma Kelly and Roxie Hart find themselves on death row together and fight for the fame that will keep them from the gallows in 1920s Chicago. | Action, Comedy, Crime, Drama, Thriller |
Rush Hour | When Hong Kong Inspector Lee is summoned to Los Angeles to investigate a kidnapping, the FBI doesn't want any outside help and assigns... | Action, Comedy, Crime, Thriller |
The Guard | Two policemen must join forces to take on an international drug- smuggling gang - one, an unorthodox Irish policeman and the other, a straitlaced FBI... | Action, Comedy, Thriller, Crime |
21 Jump Street | In high school, Schmidt was a dork and Jenko was the popular jock. After graduation, both of them joined the police force and ended up as partners riding bicycles... | Action, Comedy, Crime |
We can see that the recommendations given are all comedies, but are focused on action and crime movies similar to The Dark Knight.
I have also created a system that can give recommendations based on movie taglines and descriptions. I used the Tfdf vectorizer on these movie descriptions to perform Latent Semantic Analysis in order to calculate similarities between movies. Latent Semantic Analysis is a technique that assumes that words that are close in meaning will occur in similar pieces of text. The system produces a top chart of movies most similar to a user-provided movie, and this chart is sorted by the IMDB weighted rating to ensure that the most popular movies are recommended first.
Let's look at the top 5 recommendations for movies similar to The Dark Knight using this description-based recommender:
Title | Overview | Genres |
---|---|---|
The Usual Suspects | Held in an L.A. interrogation room, Verbal Kint attempts to convince the feds that a mythic crime lord, Keyser Soze, not only exists, but was also... | Drama, Crime, Thriller |
Captain America: The Winter Soldier | After the cataclysmic events in New York with The Avengers, Steve Rogers, aka Captain America is living quietly in Washington, D.C. and trying to adjust to the modern world... | Action, Adventure, Science Fiction |
The Raid 2 | After fighting his way through an apartment building populated by an army of dangerous criminals and escaping with his life, SWAT team member Rama... | Action, Crime, Thriller |
Elite Squad: The Enemy Within | After a bloody invasion of the BOPE in the High-Security Penitentiary Bangu 1 in Rio de Janeiro to control a rebellion of interns, the Lieutenant-Colonel... | Drama, Action, Crime |
The Dirty Dozen | 12 American military prisoners in World War II are ordered to infiltrate a well-guarded enemy chateau and kill the Nazi officers vacationing there. The soldiers, most of whom are facing death sentences for a variety of violent crimes... | Action, Adventure, War |
These recommendations show movies that are not as similar to The Dark Knight as compared to the metadata recommender. Let's look at the recommendations produced for the movie Clueless, a teen romantic comedy released in 1995.
Tagline: Sex. Clothes. Popularity. Is there a problem here?
Movie Overview: Shallow, rich and socially successful Cher is at the top of her Beverly Hills high school's pecking scale. Seeing herself as a matchmaker,
Cher first coaxes two teachers into dating each other. Emboldened by her success, she decides to give hopelessly klutzy new student Tai a makeover.
When Tai becomes more popular than she is, Cher realizes that her disapproving ex-stepbrother was right about how misguided she was -- and falls for him.
Genres: Comedy, Drama, Romance
Keywords: puberty, high school, make a match, spoiled child, etc.
Director: Amy Heckerling
Cast: Alicia Silverstone, Brittany Murphy, Stacey Dash
Title | Overview | Genres |
---|---|---|
Rushmore | When a beautiful first-grade teacher arrives at a prep school, she soon attracts the attention of an ambitious teenager named Max, who quickly falls in love with her... | Comedy, Drama |
The Chorus | Set in 1940's France, a new teacher at a school for disruptive boys gives hope and inspiration. | Drama |
Au Revoir les Enfants | A French boarding school run by priests seems to be a haven from World War II until a new student arrives. He becomes the roommate... | Drama, War |
Kick-Ass | Dave Lizewski is an unnoticed high school student and comic book fan who one day decides to become a super-hero, even though he has no powers... | Action, Crime |
The Children's Hour | A troublemaking student at a girl's school accuses two teachers of being... | Drama |
Through these recommendations we see that the system is focusing on the "school" aspect of the movie, but missing the romantic comedy aspects.
I next created a recommendation system that can provide recommendations for specific users in our dataset, based on their ratings of movies in our dataset. An SVD collaborative filtering algorithm was trained (with 91% accuracy) to be able to predict the ratings for various movies based on the ratings a user has given on several movies. The CollaborativeFiltering class contains a function capable of giving a number of recommendations for a provided user in our dataset. It provides recommendations in order of which movies are most similar to movies the user has already enjoyed.
Let's look at the top 5 recommendations for user 10 in our dataset:
Title | Overview | Genres |
---|---|---|
Sleepless in Seattle | A young boy who tries to set his dad up on a date after the death of his mother. He calls into a radio station to talk about his dad's loneliness which soon leads the... | Comedy, Drama, Romance |
Dawn of the Dead | During an ever-growing epidemic of zombies that have risen from the dead, two Philadelphia SWAT team members, a traffic reporter, and his television-executive girlfriend... | Horror |
Terminator 3: Rise of the Machines | It's been 10 years since John Connor saved Earth from Judgment Day, and he's now living under the radar, steering clear of using anything Skynet can trace. That is, until... | Action, Thriller, Science Fiction |
Shriek If You Know What I Did Last Friday the Thirteenth | Another spoof of the Scream/I Know What You Did Last Summer horror gene involving a group of popular high school students stalked by a bumbling masked killer... | Comedy |
While You Were Sleeping | A love story built on a misunderstanding. A transit worker pulls commuter Peter off the tracks after he's mugged. But while he's in a coma, his family mistakenly thinks... | Comedy, Drama, Romance |
From these recommendations, we can see that user 10 seems to be a fan of romantic movies, and dramatic, thriller movies.
We can also limit our recommendations for this user to romance movies:
Title | Overview | Genres |
---|---|---|
Sleepless in Seattle | A young boy who tries to set his dad up on a date after the death of his mother. He calls into a radio station to talk about his dad's loneliness which soon leads the... | Comedy, Drama, Romance |
While You Were Sleeping | A love story built on a misunderstanding. A transit worker pulls commuter Peter off the tracks after he's mugged. But while he's in a coma, his family mistakenly thinks... | Comedy, Drama, Romance |
The Thomas Crown Affair | Young businessman, Thomas Crown is bored and decides to plan a robbery and assigns a professional agent with the right information to the job. However, Crown is soon betrayed yet cannot blow his cover because he's in love. | Romance, Crime, Thriller, Drama |
Pandora's Box | The rise and inevitable fall of an amoral but naive young woman whose insouciant eroticism inspires lust and violence in those around her. | Drama, Thriller, Romance |
Rumble Fish | Rusty James, an absent-minded street thug struggles to live up to his legendary older brother's reputation, and longs for the days when gang warfare was going on. | Action, Adventure, Crime, Drama, Romance |
Although these are all romantic movies, a lot of them are thrillers or dramas because the user has a preference for those genres.
We can improve our system even more by combining the content-based and user-preference-based recommendation systems. This function returns recommendations for a specific user and a specific movie title, based on the most similar metadata and sorted by the estimated rating given by our predictive collaborative filtering algorithm.
Let's look at the recommendations for movies similar to The Dark Knight for user 12 in the dataset:
Title | Overview | Genres |
---|---|---|
Cutthroat Island | Morgan Adams and her slave, William Shaw, are on a quest to recover the three portions of a treasure map. Unfortunately, the final portion is held by her murderous uncle, Dawg. Her crew is skeptical... | Action, Adventure |
Get Shorty | Chili Palmer is a Miami mobster who gets sent by his boss, the psychopathic "Bones" Barboni, to collect a bad debt from Harry Zimm, a Hollywood producer who... | Comedy, Thriller, Crime |
GoldenEye | James Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain. | Action, Adventure, Thriller |
Toy Story | Led by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz... | Animation, Comedy, Family |
Money Train | A vengeful New York transit cop decides to steal a trainload of subway fares; his foster brother, a fellow cop, tries to protect him. | Action, Comedy, Crime |
It seems that user 12 does not have a strong preference for movies similar to the Dark Knight, as most of the recommendations are for comedy or action movies.
Out of the four recommendation systems, the system that seemed to produce the most accurate recommendations was the content-based recommender that made recommendations based on movie metadata. This seems to accurately reflect the way most users decide on which movies to watch, as most of us tend to watch movies in genres that we like, with directors and actors that we are familiar with. However, the collaborative filtering algorithm provided highly accurate estimated ratings for users that had rated various movies in the dataset, and combining these ratings with the metadata recommendation system produces recommendations for movies very similar to movies the user has already enjoyed.