Memory-Based Collaborative Filtering: Recommending Hotels

Muscle Technology
Aug 28, 2021
5 min read

Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating).

Muscle is creating new technology to understand behavior and also create personalize offers to cardholders for that we have build several approaches and Artificial Intelligence is one of them. In this article we want to show you a simple way to get recommendations based on algorithms.

Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.

We will use this to build an algorithm capable of recommending a set hotels B = {hotel1, hotel2,...hoteln} from a specific hotel we know (hotel10) based on the Pearson correlation coefficient. The criteria used for comparison are the reviews users gave to the hotels (from 1 to 5).

To make this more interesting, we will use some real hotels and ratings downloaded from Datafiniti in the Kaggle machine learning platform (https://www.kaggle.com/datafiniti/hotel-reviews).

Enough! let's get into the fun part.

Load Hotel Ratings Data

Welcome!, let's build a collaborative-filtering algorithm to offer recommendations for Hotels based on past users ratings. For now, let's load the data that contains what we need to start.

user hotel rating
0 Paula Rancho Valencia Resort Spa 5.0
1 D Rancho Valencia Resort Spa 5.0
2 Ron Rancho Valencia Resort Spa 5.0
3 jaeem2016 Aloft Arundel Mills 2.0
4 MamaNiaOne Aloft Arundel Mills 5.0

As you can see, we have a table with only three columns: the username, the hotel name, and the rating given. We will assume that this data contains only the last rating provided by the user to a particular hotel. The data format is similar to what we will find in our database. To build collaborative filtering, we require to use a matrix where one dimension is the users, the other is the hotel names, and the values are the ratings. This is easily done with Pandas.

Let's build our rating matrix with Pandas pivot_table()

Convert Data to Ratings Matrix

This matrix is what we need to start. One thing to look at is that the matrix is sparse. There will be a lot of zeros in the matrix because it is unlikely that all the users visited all the hotels, therefore it is expected that a particular user visited just a few ones.

Dimensional Reduction using SVD

We will be looking to recommend similar a set of hotels to hotel X (we will choose one). So we will "compress" the user's dimension using the Singular Value Decomposition (SVD) transformer. SVD works great with sparse matrices such as ours. We will reduce users from 6942 to 10.

To do this, we will flip our rating matrix with the Transpose operator so that the hotel-names dimension will be Y and user's be on the X-axis. After doing this, we will apply the SVD transform.

user 'Sina Bamtefa 007lele 0501MVKG 0ls0njo 0theHero 103bennier 106PamelaL 108peggyt 112traveler47 121dawne ... yves r yyftam z zackandbritt8611 zamguy2013 zenleanne zfakhavan zhamant zip98221 zumbadiva1
hotel
1906 Lodge At Coronado Beach 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
250 Main Hotel 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AC Hotel Chicago Downtown 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AC Hotel Miami Beach 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
AC Hotel by Marriott Boston Downtown 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 6942 columns

(1670, 10)

Success! We have compressed 6942 users to 10, and we kept all the hotels.

Calculate the Pearson Correlation Coefficient

The Pearson Correlation is a measure of linear correlation between two sets of data. This coefficient is the covariance of the two variables divided by the product of their standard deviations. In other words, it compares how "similar" are two vectors of the same size. Now that we have 1670 hotels, each one with a vector of size 10, we can compare one to each other to define this correlation coefficient. We assume that if the Pearson coefficient calculated from two hotels, say P(hotel1, hotel2) is 0.99, they are highly correlated, so if the user liked hotel1, they will probably like hotel2.

Numpy has the np.corrcoef() method to calculate all the Pearson coefficients in a single step. Let's use it to get all the coefficients at once.

(1670, 1670)

Recommendations

Good job; we have everything ready to start getting recommendations. We have all the ingredients in place. Now, let's create a method to obtain recommendations based on the hotel name.

The recommended (name) method performs the following tasks:

Example 1: Hotels similar to AC Hotel Miami Beach

pearson hotel
3 1.000000 AC Hotel Miami Beach
784 0.977024 Hampton Inn Suites Lavonia
221 0.951110 Best Western Plus Hotel At The Convention Center
1028 0.940408 Inn At The 5th
1286 0.938370 Ramada Plaza Hawthorne/LAX
1328 0.937191 Residence Inn Annapolis
949 0.934082 Home2 Suites by Hilton Buffalo Airport/Galleri...
1347 0.932799 Residence Inn Phoenix North/Happy Valley
1401 0.932389 Shilo Inn Suites - Coeur d'Alene
537 0.932084 Dillon Motel

Example 2: Hotels similar to Silver Sands Oceanfront Motel

pearson hotel
1407 1.000000 Silver Sands Oceanfront Motel
670 0.970656 Grande Colonial La Jolla
578 0.966207 Element Basalt - Aspen
250 0.963198 Best Western Plus Williston Hotel & Suites
757 0.960163 Hampton Inn Orange City
798 0.958458 Hampton Inn Suites Sioux City/South
455 0.949858 Courtyard Las Vegas Henderson/Green Valley
1644 0.945691 Wildflower Inn
340 0.943473 Carter Iva
1099 0.942650 Lyttleton Inn

Example 3: Hotels similar to Hampton Inn Suites National HarborAlexandria Area

pearson hotel
791 1.000000 Hampton Inn Suites National HarborAlexandria Area
848 0.976166 Hilton Garden Inn Chesapeake/Suffolk
1311 0.975341 Red Roof Inn Hampton Coliseum Convention Center
754 0.974526 Hampton Inn Norfolk/Virginia Beach
604 0.974434 Extended Stay America Washington, D.C. - Sprin...
1599 0.974317 TownePlace Suites by Marriott Suffolk Chesapeake
932 0.974111 Holiday Inn Express and Suites Exmore, Eastern...
600 0.974036 Extended Stay America Hampton - Coliseum
543 0.974036 DoubleTree by Hilton Hotel Orlando at SeaWorld
1040 0.969385 Island View Motel

Improvements to this example!

Nice! You have reached the end of our tour of memory-based collaborative filtering for hotels. We want to finish this post with some things we can do to improve recommendations in this example targeting some real-life conditions. Here are some things you can do to improve recommendations and user experience:

With Muscle you will have an Artificial Intelligence solution with personalization, engagement, ordering and filtering features that improve your profits and save you money. If you want to build the future, contact us today.

Juan Zamora-Mora, Ph.D
Subscribe to amazing content
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.