Memory-Based Collaborative Filtering: Recommending Hotels

Muscle Technology

Aug 28, 2021

5 min read

Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating).

Muscle is creating new technology to understand behavior and also create personalize offers to cardholders for that we have build several approaches and Artificial Intelligence is one of them. In this article we want to show you a simple way to get recommendations based on algorithms.

Collaborative filtering is a method of making automatic predictions about the interests of a user by collecting preferences or taste information from many users (collaborating). The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person.

We will use this to build an algorithm capable of recommending a set hotels B = {hotel1, hotel2,...hoteln} from a specific hotel we know (hotel10) based on the Pearson correlation coefficient. The criteria used for comparison are the reviews users gave to the hotels (from 1 to 5).

To make this more interesting, we will use some real hotels and ratings downloaded from Datafiniti in the Kaggle machine learning platform (https://www.kaggle.com/datafiniti/hotel-reviews).

Enough! let's get into the fun part.

Load Hotel Ratings Data

Welcome!, let's build a collaborative-filtering algorithm to offer recommendations for Hotels based on past users ratings. For now, let's load the data that contains what we need to start.

‍

	user	hotel	rating
0	Paula	Rancho Valencia Resort Spa	5.0
1	D	Rancho Valencia Resort Spa	5.0
2	Ron	Rancho Valencia Resort Spa	5.0
3	jaeem2016	Aloft Arundel Mills	2.0
4	MamaNiaOne	Aloft Arundel Mills	5.0

‍

As you can see, we have a table with only three columns: the username, the hotel name, and the rating given. We will assume that this data contains only the last rating provided by the user to a particular hotel. The data format is similar to what we will find in our database. To build collaborative filtering, we require to use a matrix where one dimension is the users, the other is the hotel names, and the values are the ratings. This is easily done with Pandas.

Let's build our rating matrix with Pandas pivot_table()

Convert Data to Ratings Matrix

‍

hotel	1906 Lodge At Coronado Beach	250 Main Hotel	AC Hotel Chicago Downtown	AC Hotel Miami Beach	AC Hotel by Marriott Boston Downtown	ARIA Resort Casino	Acadia Suites	Ace Hotel Chicago	Ace Hotel New Orleans	Admiral Hotel	...	Wyndham Garden Lafayette	Wyndham Garden Pittsburgh Airport	Wyndham Garden San Jose Silicon Valley	Wyndham Garden-amarillo	Wyndham Houston - Medical Center Hotel and Suites	XV Beacon	Yakutat Lodge	dana hotel and spa	hampton inn Springfield southeast	hotel le bleu
user
'Sina Bamtefa	0	0	0.0	0	0	0	0	0.0	0	0	...	0	0.0	0.0	0	0	0	0	0.0	0	0
007lele	0	0	0.0	0	0	0	0	0.0	0	0	...	0	0.0	0.0	0	0	0	0	0.0	0	0
0501MVKG	0	0	0.0	0	0	5	0	0.0	0	0	...	0	0.0	0.0	0	0	0	0	0.0	0	0
0ls0njo	0	0	0.0	0	0	0	0	0.0	0	0	...	0	0.0	0.0	0	0	0	0	0.0	0	0
0theHero	0	0	0.0	0	0	0	0	0.0	0	0	...	0	0.0	0.0	0	0	0	0	0.0	0	0

5 rows × 1670 columns

‍

This matrix is what we need to start. One thing to look at is that the matrix is sparse. There will be a lot of zeros in the matrix because it is unlikely that all the users visited all the hotels, therefore it is expected that a particular user visited just a few ones.

Dimensional Reduction using SVD

We will be looking to recommend similar a set of hotels to hotel X (we will choose one). So we will "compress" the user's dimension using the Singular Value Decomposition (SVD) transformer. SVD works great with sparse matrices such as ours. We will reduce users from 6942 to 10.

To do this, we will flip our rating matrix with the Transpose operator so that the hotel-names dimension will be Y and user's be on the X-axis. After doing this, we will apply the SVD transform.

‍

user	'Sina Bamtefa	007lele	0501MVKG	0ls0njo	0theHero	103bennier	106PamelaL	108peggyt	112traveler47	121dawne	...	yves r	yyftam	z	zackandbritt8611	zamguy2013	zenleanne	zfakhavan	zhamant	zip98221	zumbadiva1
hotel
1906 Lodge At Coronado Beach	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
250 Main Hotel	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
AC Hotel Chicago Downtown	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
AC Hotel Miami Beach	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
AC Hotel by Marriott Boston Downtown	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0

5 rows × 6942 columns

‍

(1670, 10)

‍

Success! We have compressed 6942 users to 10, and we kept all the hotels.

‍

Calculate the Pearson Correlation Coefficient

‍

The Pearson Correlation is a measure of linear correlation between two sets of data. This coefficient is the covariance of the two variables divided by the product of their standard deviations. In other words, it compares how "similar" are two vectors of the same size. Now that we have 1670 hotels, each one with a vector of size 10, we can compare one to each other to define this correlation coefficient. We assume that if the Pearson coefficient calculated from two hotels, say P(hotel1, hotel2) is 0.99, they are highly correlated, so if the user liked hotel1, they will probably like hotel2.

‍

Numpy has the np.corrcoef() method to calculate all the Pearson coefficients in a single step. Let's use it to get all the coefficients at once.

‍

(1670, 1670)

‍

Recommendations

Good job; we have everything ready to start getting recommendations. We have all the ingredients in place. Now, let's create a method to obtain recommendations based on the hotel name.

The recommended (name) method performs the following tasks:

gets the index of the hotel from the rating matrix (which column number is)
gets ALL the pearson coefficients from that hotel against all others.
now creates a pandas data frame with hotel names, Pearson coefficients and sorts them descending. The top 10 are those that are the most similar. Yes, the top one with a Perason coefficient of 1 should be the same hotel.

‍

Example 1: Hotels similar to AC Hotel Miami Beach

‍

	pearson	hotel
3	1.000000	AC Hotel Miami Beach
784	0.977024	Hampton Inn Suites Lavonia
221	0.951110	Best Western Plus Hotel At The Convention Center
1028	0.940408	Inn At The 5th
1286	0.938370	Ramada Plaza Hawthorne/LAX
1328	0.937191	Residence Inn Annapolis
949	0.934082	Home2 Suites by Hilton Buffalo Airport/Galleri...
1347	0.932799	Residence Inn Phoenix North/Happy Valley
1401	0.932389	Shilo Inn Suites - Coeur d'Alene
537	0.932084	Dillon Motel

‍

Example 2: Hotels similar to Silver Sands Oceanfront Motel

‍

	pearson	hotel
1407	1.000000	Silver Sands Oceanfront Motel
670	0.970656	Grande Colonial La Jolla
578	0.966207	Element Basalt - Aspen
250	0.963198	Best Western Plus Williston Hotel & Suites
757	0.960163	Hampton Inn Orange City
798	0.958458	Hampton Inn Suites Sioux City/South
455	0.949858	Courtyard Las Vegas Henderson/Green Valley
1644	0.945691	Wildflower Inn
340	0.943473	Carter Iva
1099	0.942650	Lyttleton Inn

‍

Example 3: Hotels similar to Hampton Inn Suites National HarborAlexandria Area

‍

	pearson	hotel
791	1.000000	Hampton Inn Suites National HarborAlexandria Area
848	0.976166	Hilton Garden Inn Chesapeake/Suffolk
1311	0.975341	Red Roof Inn Hampton Coliseum Convention Center
754	0.974526	Hampton Inn Norfolk/Virginia Beach
604	0.974434	Extended Stay America Washington, D.C. - Sprin...
1599	0.974317	TownePlace Suites by Marriott Suffolk Chesapeake
932	0.974111	Holiday Inn Express and Suites Exmore, Eastern...
600	0.974036	Extended Stay America Hampton - Coliseum
543	0.974036	DoubleTree by Hilton Hotel Orlando at SeaWorld
1040	0.969385	Island View Motel

Improvements to this example!

Nice! You have reached the end of our tour of memory-based collaborative filtering for hotels. We want to finish this post with some things we can do to improve recommendations in this example targeting some real-life conditions. Here are some things you can do to improve recommendations and user experience:

Filter by longitude and latitude. You can calculate how far the hotel is from others and sort them by distance. There are some popular distance methods, such a Euclidean or by using the Haversine formula.
Try other distance methods. We used the Pearson Coefficient but feel free to try Cosine Similarity, Spearman, Mean Square Distance, etc.
Update the rating matrix often. This matrix is a kind of active learning, as the newest ratings by the users should be used to re-calculate the correlation coefficients that can bring new recommendations.
Check if recommendations make sense. Choose a hotel, ask for ten offers and check if they make sense based on this list of recommendations.
Try other dim-reduction algorithms. The TruncateSVD is a way to deal with the cold start. Because there are so many zeros in our matrix, using a dimensional reduction method helps. Using other dimensional reduction techniques instead of SVD might produce different results. Some candidate algorithms to test are not limited to Isomap Embedding, Spectral Embedding, PCA, and t-SNE.

‍

With Muscle you will have an Artificial Intelligence solution with personalization, engagement, ordering and filtering features that improve your profits and save you money. If you want to build the future, contact us today.

Juan Zamora-Mora, Ph.D