Rating prediction of peer-to-peer accommodation through attributes and topics from customer review

Subroto, Athor; Christianis, Marcel

doi:10.1186/s40537-020-00395-6

Research
Open access
Published: 06 January 2021

Rating prediction of peer-to-peer accommodation through attributes and topics from customer review

Journal of Big Data volume 8, Article number: 9 (2021) Cite this article

4752 Accesses
10 Citations
1 Altmetric
Metrics details

Abstract

This study aims to predict customers’ behavior in classifying their reviews as high rated or low rated using associated attributes and topics found in the review. Knowing customer reviewing action better can lead to a successful strategy implementation of the relevant parties related to this study such as policy to manage customer reviews by keeping their satisfaction high. We applied a big data approach on a dataset of 55,377 reviews from Airbnb listings in the top 10 most visited cities in Indonesia (based on foreign arrivals data). We used The Classification and Regression Tree Model, Random Forest Model, Least Absolute Shrinkage and Selection Operation and Logistic Regression Model, Artificial Neural Network as well as Multi-Layer Perceptron to make prediction’s classification. Those models are used to identify a set of attributes and topics that will increase the chance of the review to render a high rate and a different set of attributes and topics that will lead the review to be low rated. This study found; first, attributes and topics that influence customers' odds to classify their review as high rated or low rated adhere to the understanding of Peer to Peer accommodation attributes. Second, successfully proved that customer reviews' attributes and topics could be used to predict the classification of ratings in Peer to Peer accommodation. Where for Topics, we can predict the rating using Random Forest yields 60.09% accuracy, slightly better than Artificial Neural Network (58.33%) and Multi-Layer Perceptron (58.8%). However, it seems better to use Attributes to predict the rating, where the accuracy is yielded better by applying Artificial Neural Network with 84.79% accuracy compared to Multi-Layer Perceptron with only 72.35% of accuracy.

Introduction

Peer-to-peer (P2P) is an emerging model in the tourism industry that has changed and disrupted the way consumers choose accommodation. P2P is a component of a bigger movement known as the “Sharing Economic,” where consumers share and offer underutilized assets to other consumers [1]. P2P accommodation has established itself as a viable alternative to the traditional accommodation model and has driven away significant market share in the accommodation industry [2]. This study will focus on Airbnb, a dominant P2P accommodation platform used across the globe. Albeit recent, the peer-to-peer accommodation model has grown at an unprecedented speed, which causes an understanding of the model to lag behind the phenomenon that is taking action in the industry. As such, many cast the understanding and perspective of traditional accommodation in order to make sense of how P2P accommodation works and the driving forces behind the decision making process of P2P accommodations’ customers [3]. Yet, there is no denying that P2P accommodation must be viewed as a separate model and a fresh perspective.

Several prior studies have attempted to identify a separate set of attributes that exclusively describe P2P accommodation without being anchored to traditional accommodation perspectives. Previous research identified a new set of attributes that are commonly found in P2P accommodation and able to describe the nuanced aspects of the Airbnb experiences that are not found in traditional accommodation [4, 5]. Bridges and Vásquez have looked into how languages are used and contrasted in positive and negative customer reviews by Airbnb customers [6]. However, existing studies mainly focus on identifying by breaking down customer reviews but have yet to look further into the role of these attributes and how they can predict customer behaviors in term of giving final rating.

Given the importance of P2P accommodation attributes in understanding customers’ decisions, this study sees the opportunity to fill the gap of how customers’ behavior can be predicted based on the attributes they use in writing reviews. The study employs a Big Data approach by creating a predictive model on attributes extracted from a large set of Airbnb customer reviews. The Big Data's significance to this study is that the view coming from the millions of customers is considerably frank by their own willingness to share that reflects the user experiences. Thus, Big Data can offer the real like situation and robustness of the data. The studies build on top of the research by [6] by predicting customer behavior in classifying their review as high rated or low rated based on the usage of attributes as previously identified by [4, 5]. The studies will use an array of classification predictive models such as Classification and Regression Tree (CART) Model, Random Forest (RF), Least Absolute Shrinkage and Selection Operation (LASSO) Logistic Regression, Artificial Neural Network (ANN), and Multi-Layer Perceptron (MLP) to perform classification prediction. The study contributes to the P2P accommodation literature by providing a case of understanding how customers behave through reviews that they wrote. The study also contributes methodologically by showing the feasibility of creating a predictive model using attributes and topics identified from the texts' collection.

Literature review

Sharing economy and peer-to-peer accommodation

Reviews on the peer to peer accommodation have been conducted by some researchers [7,8,9], mostly using Airbnb related data [10,11,12,13] and through the sharing economy [14, 15]. The sharing economy refers to a global phenomenon with rapid growth potential [16]. That is where consumers are sharing and granting each other temporary access to their privately owned goods. In other words, asset owners utilize digital clearinghouses to make the most out of the unused capacity of things that people possess [17]. This behavior is supercharged and enabled by online marketplaces that act as intermediaries between consumers. Usually, the property consumers offer are underutilized physical assets and can be provided to other consumers for monetary benefits [18]. Among the existing sharing economy model, peer-to-peer accommodation has emerged as a significant segment of the sharing economy that has impacted the tourism industry [19, 20]. One research argues that P2P accommodation may be a form of evolution in the accommodation offerings market due to hotels’ inability to meet the needs and habits of their guests. Thus, most of its users consider it as a hotel substitute [21], although during weekends and holidays [22]. Those needs are predominantly in the dimensions of uncertainty, localness, communities, and personalization [23]. As well as because of the needs around modern internet technologies, distinct appeal, which centers on cost-savings, household amenities, and the potential for more authentic local experiences [24].

P2P accommodation has several distinct differences in comparison to traditional hotels. In the sharing economy model of P2P accommodation, neither supplier nor consumer is affiliated with the platform that enables the transactions. The platform simply facilitates the discovery process and mediates transactions. That contrasts with how traditional hotels work where customers will transact with an established entity that manages the accommodation service [18]. The most significant and most noticeable difference with the sharing economy model is the aspect of risk. Unlike hotels, P2P accommodation has a lack of standards and operating procedures dependent on each host's capabilities. In the traditional hotel model, there is a clear definition of the standards that are reflected in the hotel’s brand, operating procedures, price, and experience.

Meanwhile, in P2P accommodation, these attributes vary from one accommodation to another. Since customers cannot immediately judge the accommodation's quality, there is a degree of caution behind choosing accommodation in the peer-to-peer model. This caution is mitigated by the review system that platforms enforced with the interest of establishing trust between hosts and potential future guests [18]. However, the guest's reviews could give challenges to the Airbnb host, such as risk, lack of privacy, and emotional stress [25].

Peer-to-peer accommodation attributes and topics

The human element is a crucial characteristic of P2P accommodation that fundamentally change how accommodation is perceived. The human element rooted in the behavior of people renting out their property to guests creates an authentic experience with the property host that conventional hotels can’t replicate [5]. In understanding how P2P accommodation is perceived, previous research has attempted to find what attributes and aspects of P2P accommodation that guests care about. A study finds that service quality associated with the website, host, and facility can produce distinctive customer satisfaction effects [26]. According to [27], Airbnb and hotels' critical differences are reflected mainly through a wide variety of distinctive and similar attributes. The key differences include bringing pets and the opportunity to encounter hosts’ pets, atmosphere, flexibility, value for money, and quality assurance. Most of them are strongly attracted by practical attributes and somewhat less so by its experiential attributes [28]. P2P accommodation has unique motivators that are linked to the characteristics of human elements: environmental responsibility, community, and economic benefits [4]. Financial benefits are an aspect as some people look at the sharing economy as a cheaper alternative to traditional accommodation options. This perspective has been countered by [4], who identified that motivation for P2P accommodation also comes from reasons beyond monetary factors, such as community. Community is a value that leans towards the idea of social relationships and the practice of sharing, openness, and collaboration. It reflects guests’ desire for social interactions and reflects how guests interact with locals, experience local cultures, and indulge in local culinary. The community's value can also evoke the feeling of homeliness and create a home-like atmosphere [4]. Sustainability is a factor of sharing economy where it is believed that reductions of environmental harm can be achieved through the use of underutilized assets. By using P2P accommodation, we are not spending more resources but using assets that already exist [4].

Research about the Topics in P2P accommodation identified four critical topics; they are location, amenities, host, and recommendation [5]. The location's theme covers concepts such as geographical location to the point of interest, distance to nearby landmarks, and easiness of access to the accommodation. The location's convenience is essential, which describes the reach and ease of access to major tourist attractions, transportation hubs, and points of interest from the accommodation. The theme of amenity is a broad theme that also discusses the theme of facility and room. Amenity describes the availability of basic facilities such as towels, soap, and breakfast, which guests desire but may or may not be essential to the accommodation. Facilities deal with a broader accommodation category, such as the availability of a garden, pool, or balcony at the accommodation.

Meanwhile, the room's theme describes the environment inside the room, such as space, bed, room design, cleanliness, and other decoration. An essential attribute for a room is privacy and quality of sleep. The host's theme encompasses concepts that describe host's role in facilitating an Airbnb experience to guests [5]. The host is an important theme as the host has a central role in setting the guests’ experience upon their visit. The last strong theme that emerged within reviews that guests made is a recommendation. While the recommendation itself is not an attribute that directly describes P2P accommodation, it is an outcome discussed by guests of P2P accommodation as a result of the other identified themes [5].

Security issues in accommodation, according to [3] refer to guests’ safety during their stay. They defined two sides of security; the first is the guests’ active contribution in creating a safe environment, concerning issues such as illegal drug use, identity theft, and other unlawful activities that guests may do during their stay. Second is the hosts’ ability to create a safe environment at the accommodation, which includes maintaining the accommodation well, preparing safety equipment, and having safety precautions [3].

Positive and negative customer reviews

Guests often seek advice or review from other people to justify their purchase decision, where the review score and negative sentiment are tested significant [13]. As such, positive and negative word of mouth is proven to have a strong influence on consumer purchase decisions [29]. However, negative reviews are more authentic and credible than positive reviews on Airbnb. Social words' occurrence is positively related to positive emotion in reviews but negatively related to negative emotion in reviews [30].

P2P platforms such as Airbnb adopted customer reviews as an official feature that facilitates word of mouth within the platform. Since guests who have prior experience are encouraged to share their opinions, customer reviews also become a way for guests to express their satisfaction and dissatisfaction towards the accommodation they chose. These expressions are written in words but also quantified through the usage of the rating system. Hence, ratings and reviews in P2P accommodation also act as a proxy behind guests’ satisfaction and dissatisfaction [29].

The reasoning behind how customers rate in online platforms can be explained by the research of [31]. They made an observation using the J-shaped distribution in ratings. They found that the majority of consumers who write customer reviews tend to write positive reviews to express their satisfaction in the form of a five or 4-star rating on a 5 point rating scale. Meanwhile, a noticeably smaller number of negative reviews express dissatisfaction in the form of a two or one-star rating on a 5 point rating scale. The reason behind this is because people with moderate and undesirable views are less passionate to exert the time and effort to report their ratings in comparison to people who have desirable and positive aspects on their experience. This phenomenon is not unique to Airbnb as the same pattern was found on a similar study conducted on customer reviews found on Amazon, Yelp, and TripAdvisor, where customer reviews are highly skewed towards the positive sides.

Prior research using customer reviews in the accommodation industry to draw meaningful insights on P2P accommodation are definite. The study found that the overall review star rating correlates pretty well with the sentiment scores for both the title and the full content of the online customer review [32]. [33] tried to present a case of text mining on Airbnb user reviews to analyze and understand various aspects that drive customer satisfaction. In the different industries, such as airlines, using a text mining approach on the Online Customer Reviews (OCRs) can predict airline recommendations by customers, resulting in an accuracy of 79.95% [34].

Several studies of online reviews have seen a consistent positivity bias in the writing of reviews [6]. They looked into factors that contribute to the positivity bias behind customer reviews. The first reason is guests’ expectations being lower for individuals' accommodations in comparison to accommodations provided by hotels. Since Airbnb properties are provided by individuals who act as hosts, in addition to the lack of standards that Airbnb properties have, guests tend to be more realistic in their expectations over what they will receive from their Airbnb stays [35]. Second, there also tends to be a more personal and personalized interaction between the host and the guest. Often, guests will be communicating with the person who owns the property directly instead of talking to customer service staff as they do with hotels [36]. Furthermore, reviews in Airbnb are not anonymous, and each review is linked to the reviewers’ profile. This personal experience and lack of anonymity lead to the behavior where posting negative feedback may be difficult and awkward for guests and may lead to guests not writing negative reviews when possible [35].

Lastly, a bias towards positive reviews can be driven by Airbnb’s review guidelines. Although Airbnb doesn’t directly edit and censor reviews, Airbnb reserves the right to remove personal insults, contain profanity, discrimination, and generally inappropriate and against their review guidelines. Airbnb review guidelines also encourage a typically positive and constructive review, which possibly becomes a community-developed norm that Airbnb users abide and follow [35].

There are three main findings that the study by [6] discovered regarding Airbnb customer reviews. Firstly, customer reviews written are highly positive and frequently comment on the ease of communication and the accommodation's cleanliness. Second, a negative review is rare, and when it is found, it is written in a manner that is suppressed, and complaints are sandwiched in between positive comments. Finally, when guests do not feel like writing a positive review but avoid writing negative reviews, they chose to write lukewarm reviews where the content is positive but lacks the enthusiasm that usually comes with a genuine positive review.

Method

Data collection

Data for this research consists of guest reviews of Airbnb listings from the top 10 cities in Indonesia based on cities with the highest foreign arrivals according to statistical data issued by the Indonesian Central Agency of Statistics [37]. This study focuses on the top 10 cities as a proxy that mirrors Indonesia's popular destinations to foreigners. That also reflects the dominant amount of English reviews found in the dataset. The Foreign arrival and English use for the review reflect the geographical diversity of the data drawn. The research applied a web scraping method using direct HTTP request with Javascript to get user review data from Airbnb’s website. We drawn a total of 66,630 reviews from 7356 properties listed on the Airbnb website from the ten cities. 55,377 out of the 66,630 reviews are written in English (83%) and have an average length of 42 words. In general, the steps of our study mentioned earlier and will be explained subsequently after this section can be visualized in the form of a flowchart as shown in Fig. 1.

Data analysis

A very limited previous research can be found in prediction rating using customer reviews. [38] used sentiment analysis from customer reviews to predict hotel ratings where they discovered that classified reviews as positive or negative are correlated positively with numerical ratings. Some other studies used customer reviews to predict the beaches' rating [39] and [40] urged that neuro-fuzzy can be utilized for sentiment analysis and review rating prediction tasks.

Another recent study in prediction related to peer to peer accommodation was done by [41], where they found that house popularity can be predicted more effectively using a Dual-Gated Recurrent Unit (DGRU). The predictive model will attempt to classify reviews as high rated or low rated based on its associated P2P accommodation attributes as the independent variable. In creating this model, several steps are taken. The first step is to classify the data as high rated or low rated. In accordance with our positive and negative review literature, we will classify reviews into two categories; high rating and low rating. High rating reviews are reviews that have four or 5-star ratings, and low rating reviews are reviews that have one or 2-star ratings [31]. In this model, reviews with a star rating 3 will be excluded. A new column called ‘highrating’ is created and given the value of 1 if the review is categorized as a high rating and the value of 0 if the review is categorized as a low rating. This prediction aims to be able to classify reviews as either a high rating or low rating based on the binary value of 1 or 0 as registered in the binary column. Another pre-processing data judgment is to limit reviews to those written before the year 2020 and reviews with at least 5 characters or more. This step provides us with 544 negative reviews and 48,435 positive reviews to analyze.

The following pseudocode initiates the scripting environment, cleans the reviews dataset by removing rows with a rating of 0, rating of 3, the year 2020, and comments with length less or equal to 5, see Table 1.

Table 1 Pseudocode to initiates the scripting environment, cleans the reviews dataset

Rating prediction of peer-to-peer accommodation through attributes and topics from customer review

Abstract

Introduction

Literature review

Sharing economy and peer-to-peer accommodation

Peer-to-peer accommodation attributes and topics

Positive and negative customer reviews

Method

Data collection

Data analysis

Results

Using attributes to predict rating classification in customer review

Using topics to predict rating classification in customer review

Discussion

Conclusion and implication for further research

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Authors' information

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords