Muscle Technology

Aug 28, 2021

5 min read

Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating).

Muscle is creating new technology to understand behavior and also create personalize offers to cardholders for that we have build several approaches and Artificial Intelligence is one of them. In this article we want to show you a simple way to get recommendations based on algorithms.

Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.

We will use this to build an algorithm capable of recommending a set hotels B = {hotel1, hotel2,...hoteln} from a specific hotel we know (hotel10) based on the Pearson correlation coefficient. The criteria used for comparison are the reviews users gave to the hotels (from 1 to 5).

To make this more interesting, we will use some real hotels and ratings downloaded from Datafiniti in the Kaggle machine learning platform (https://www.kaggle.com/datafiniti/hotel-reviews).

Enough! let's get into the fun part.

Welcome!, let's build a collaborative-filtering algorithm to offer recommendations for Hotels based on past users ratings. For now, let's load the data that contains what we need to start.

As you can see, we have a table with only three columns: the username, the hotel name, and the rating given. We will assume that this data contains only the last rating provided by the user to a particular hotel. The data format is similar to what we will find in our database. To build collaborative filtering, we require to use a matrix where one dimension is the users, the other is the hotel names, and the values are the ratings. This is easily done with Pandas.

Let's build our rating matrix with Pandas pivot_table()

This matrix is what we need to start. One thing to look at is that the matrix is sparse. There will be a lot of zeros in the matrix because it is unlikely that all the users visited all the hotels, therefore it is expected that a particular user visited just a few ones.

We will be looking to recommend similar a set of hotels to hotel X (we will choose one). So we will "compress" the user's dimension using the Singular Value Decomposition (SVD) transformer. SVD works great with sparse matrices such as ours. We will reduce users from 6942 to 10.

To do this, we will flip our rating matrix with the Transpose operator so that the hotel-names dimension will be Y and user's be on the X-axis. After doing this, we will apply the SVD transform.

(1670, 10)

Success! We have compressed 6942 users to 10, and we kept all the hotels.

The Pearson Correlation is a measure of linear correlation between two sets of data. This coefficient is the covariance of the two variables divided by the product of their standard deviations. In other words, it compares how "similar" are two vectors of the same size. Now that we have 1670 hotels, each one with a vector of size 10, we can compare one to each other to define this correlation coefficient. We assume that if the Pearson coefficient calculated from two hotels, say P(hotel1, hotel2) is 0.99, they are highly correlated, so if the user liked hotel1, they will probably like hotel2.

Numpy has the np.corrcoef() method to calculate all the Pearson coefficients in a single step. Let's use it to get all the coefficients at once.

(1670, 1670)

Good job; we have everything ready to start getting recommendations. We have all the ingredients in place. Now, let's create a method to obtain recommendations based on the hotel name.

The recommended (name) method performs the following tasks:

- gets the index of the hotel from the rating matrix (which column number is)
- gets ALL the pearson coefficients from that hotel against all others.
- now creates a pandas data frame with hotel names, Pearson coefficients and sorts them descending. The top 10 are those that are the most similar. Yes, the top one with a Perason coefficient of 1 should be the same hotel.

Nice! You have reached the end of our tour of memory-based collaborative filtering for hotels. We want to finish this post with some things we can do to improve recommendations in this example targeting some real-life conditions. Here are some things you can do to improve recommendations and user experience:

**Filter by longitude and latitude**. You can calculate how far the hotel is from others and sort them by distance. There are some popular distance methods, such a Euclidean or by using the Haversine formula.**Try other distance methods**. We used the Pearson Coefficient but feel free to try Cosine Similarity, Spearman, Mean Square Distance, etc.**Update the rating matrix often**. This matrix is a kind of active learning, as the newest ratings by the users should be used to re-calculate the correlation coefficients that can bring new recommendations.**Check if recommendations make sense**. Choose a hotel, ask for ten offers and check if they make sense based on this list of recommendations.**Try other dim-reduction algorithms**. The TruncateSVD is a way to deal with the cold start. Because there are so many zeros in our matrix, using a dimensional reduction method helps. Using other dimensional reduction techniques instead of SVD might produce different results. Some candidate algorithms to test are not limited to Isomap Embedding, Spectral Embedding, PCA, and t-SNE.

With Muscle you will have an Artificial Intelligence solution with personalization, engagement, ordering and filtering features that improve your profits and save you money. If you want to build the future, contact us today.

Juan Zamora-Mora, Ph.D

Subscribe to amazing content

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.