Skip to main content

Advertisement

Mining aspects of customer’s review on the social network

Article metrics

  • 1500 Accesses

  • 1 Citations

Abstract

This study represents an efficient method for extracting product aspects from customer reviews and give solutions for inferring aspect ratings and aspect weights. Aspect ratings often reflect the user’s satisfaction on aspects of a product and aspect weights reflect the degree of importance of the aspects posed by the user. These tasks therefore play a very important role for manufacturers to better understand their customers’ opinion on their products and services. The study addresses the problem of aspect extraction by using aspect words based on conditional probability combined with the bootstrap technique. To infer the user’s rating for aspects, a supervised approach called the Naïve Bayes classification method is proposed to learn the aspect ratings in which sentiment words are considered as features. The weight of an aspect is estimated by leveraging the frequencies of aspect words within each review and the aspect consistency across all reviews. Experimental results show that the proposed method obtains very good performance on real world datasets in comparison with other state-of-the-art methods.

Introduction

In recent years, a lot of people often express their opinions about things such as products and services on social networks and e-commerce web sites. These opinions or reviews often play significant role in improving the quality of products and services. However, the huge amount of reviews poses a challenge of how to efficiently mine useful information about a product or a service. To deal with this problem, much work has been introduced including summarizing users’ opinions [1], extracting information from reviews [2,3,4,5], analyzing user sentiments [6,7,8,9], and so on. In this paper, we focus on the problem of extracting information from reviews. More specifically, this study aims at developing efficient methods for dealing with the three tasks: extracting aspects mentioned in the reviews of a product, inferring the user’s rating for each identified aspect, and estimating the weight posed on each aspect by the users.

A user review often mentions different aspects, which are attributes or components of a product. An aspect is usually a concept in which the user’s opinion is expressed in different level of positivity or negativity. For example, in the review given in Fig. 1, the user likes the coffee, manifested by a 5-star overall rating. However, positive opinions about body, taste, aroma and acidity aspects of the coffee are also given. The task of aspect extraction is to identify all such aspects from the review. A challenge here is that some aspects are explicitly mentioned and some are not. For instance, in the review given in Fig. 1, taste and acidity of the coffee are explicitly mentioned, but body and aroma are not explicitly specified. Some previous work dealt with identifying explicit aspects only, for example [10]. In our paper, both explicit and implicit aspects are identified. Another difficulty of the aspect extraction task is that it may generate a lot of noise in terms of non-aspect concepts. How to minimize noise while still be able to identify rare and important aspects is also one of our concerns in this paper.

Fig. 1
figure1

Comment of Trung Nguyen coffee

Most of the earliest work to identify aspects are unsupervised model-based [11], in which statistics of relevant words are used. These methods do not require the labeled training data and have low cost. For example, frequency-based methods [10, 12, 13] consider high-frequent nouns or noun phrases as aspect candidates. However, frequency-based approaches may miss low-frequent aspects. Several complex filter-based approaches are applied to solve this problem; however, the results are not as good as expected because some aspects are still missed [14, 15]. Moreover, these methods face difficulty in identifying implicit aspects. To overcome these problems, some supervised learning techniques, such as the Hidden Markov Model (HMM) and Conditional Random Field (CRF) have been proposed. These techniques, however, require a set of manually labeled data for training the model and thus could be costly.

The problem of aspect extraction is solved by using aspect words based on conditional probability combined with the bootstrap technique. It is assumed that the universal set of all possible aspects for each product are readily available together with aspect words called core terms (terms that describe aspects). This assumption is practical because the number of important aspects is often small and can be easily obtained by domain experts. The aspect extraction task then becomes how to correctly assign existing aspects to sentences in the review. The main challenge here is that in many reviews, sentences do not contains enough core terms or even do not have any core term at all, and thus may be assigned with wrong aspects. This problem is solved by repeatedly updating and enlarging the set of core terms to the set of aspect words by using the conditional probability technique combined with the bootstrap technique. This method leads to better results of aspect extraction as shown in “Results and discussion” section.

After the aspects are identified, inferring the user’s rating for them may bring more thorough understanding of the user’s satisfaction. A user usually gives an overall rating which express a general impression about a product. The overall rating is not always informative enough. However, it can be assumed that the overall rating on a product is weighted sum of the user’s specific rating on multiple aspects of the product, where the weights basically measure the degree of importance of the aspects. Some previous work [16, 17] infer the user’s rating for aspects and estimate the weight of aspects at the simultaneously based on regression methods and using only the review content and the associated overall rating. Different approach is applied to infer rating and weight of aspects. More specifically, the weight of an aspect is calculated by leveraging the aspect words frequency within the review and the aspect consistency across all reviews. Then, a supervised approach called the Naïve Bayes classification method is used to infer the user’s rating for aspects. Despite the fact that the solution is relatively simple, its tested accuracy on different real-life datasets are comparable to much more sophisticated state of the art approaches as shown in “Results and discussion” section.

The Fig. 2 summaries the three tasks mentioned above. The methods for solving these tasks are discussed in details in “Method” section of this paper.

Fig. 2
figure2

An example of aspect extracting, aspect inferring, and aspect weighting tasks

The rest of this paper is structured as follows. “Related work” section introduces related works. “Problem definition” and “Method” sections represent the proposed methodology. “Results and discussion” section show experimental and evaluation of the proposed method. Finally, “Conclusion” section concludes the paper and gives some future research directions.

Related work

During the last decade, many researches work has been proposed in the opinion mining area. Researchers are paying increasing attention to methods of extracting information from reviews that indicates users’ opinions of aspects about products. A survey on opinion mining and sentiment analysis [18] shows that two important tasks of aspect-based opinion mining are aspect identification and aspect-based rating inference. The survey also mentions some interesting methods for these tasks including frequency-based, lexicon-based, machine learning and topic modeling.

Most of the earliest researches to identify aspects are frequency-based ones [11]. In these approaches, nouns and noun phrases are considered as aspect candidates [10, 12,13,14,15]. Hu and Liu [10] uses a data mining algorithm for nouns and noun phrases identification and label assignment by the part-of-speech/POS [19]. Their occurrence frequencies are counted, and only the frequent ones are kept. A frequency threshold is used and can be decided via experimental. In spite of its simplicity, this method is actually quite effective. Some commercial companies are using this method with some improvements to increase in their business [11]. However, producing “non-aspect” is the limitation of these methods because some nouns or noun phrases that have high-frequency are not really aspects.

To solve these problems, some improved methods of this filtering approach have been proposed. [15] augments the frequency-based approach with an additional pattern-based filters to remove some non-aspect terms. A similar solution, [14] extracts aspects (nouns) based on frequency and information distance. Firstly, they find seed words for each aspect by using the frequency-based method. Secondly, they use the information distance in [20] to find other related words to aspects, e.g., for aspect price, it may find “$” and “dollars”. However, the frequency-based and rule-based approaches require the manual effort of tuning various parameters, which limits their generalization in practice.

To deal with the limitations of frequency-based methods, in recent years, topic modeling has emerged as a principled method for discovering topics from a large collection of texts. These researches are primarily based on two main basic models, pLSA (Probabilistic Latent Semantic Analysis) [21] and LDA (Latent Dirichlet allocation) [22]. In [4, 15, 23,24,25], the authors apply topic modeling to learn latent topics that correlate directly with aspects. [23] proposes a topic modeling for mining aspects. Firstly, they identify aspects using topic modeling and then identify aspect-specific sentiment words by considering adjectives only. Lin et al. [4] proposes Joint Sentiment-Topic (JST) and Reverse-JST. Both models were based on the modified Latent Dirichlet allocation (LDA). These models can extract sentiment as well as positive and negative topic from the text. Both JST and RJST yield an accuracy of 76.6% on Pang and Lee [7] dataset. While topic-modeling approaches learn distributions of words used to describe each aspect, in [24], they separate words that describe an aspect and words that describe sentiment about an aspect. To perform, this study use two parameter vectors to encode these two properties, respectively. Then, a weighted bipartite graph is constructed for each review, which matches sentences in review to aspects. Learning aspect labels and parameters are performed with no supervision (i.e., using only aspect ratings), weak supervision (using a small number of manually-labeled sentences in addition to unlabeled data), or with full supervision (using only manually-labeled data). Moghaddam and Ester [15] devised factorized LDA (FLDA) to extract aspects and estimate aspect rating. The FLDA method assumes that each user (and item) has a set of distributions over aspects and aspect-based ratings. Their work on multi-domain reviews reaches to 74% for review rating on TripAdvisor data set. In [26], the authors propose a new method called Aspect Identification and Rating model (AIR) for mining textual reviews and overall ratings. Within AIR model, they allow an aspect rating to influence the sampling of word distribution of the aspect for each review. This approach is based on the LDA model. However, different from traditional topic models, the extraction of aspects (topics) and the sampling of words for each aspect are affected by the sampled latent aspect ratings which are dependent on the overall ratings given by reviewers. Then, they further enhance AIR model to handle quite unbalance of aspects mentioned in short reviews.

Although topic modeling is an approach based on probabilistic inference and it can be expanded to many types of information models, it has some limitations that restrict their use in real-life sentiment analysis applications. For example, it requires a huge amount of data and a significant amount of tuning in order to achieve reasonable results. It is very easy to find those general and frequent topics or aspects from a large document collection, but it is hard to find those locally frequent but globally that is not frequent aspects. Such locally frequent aspects are often the most useful ones for applications because they are likely to be most relevant to the specific entities that the user is interested in. In short, the results from current topic modeling methods are usually not relevant or specific enough for many practical sentiment analysis applications [11].

Besides, some lexicon-based methods, which are also unsupervised approach, are proposed. Opinions are extracted with respect to each feature using the dictionary-based approach, which also yields polarity and strength. These methods use a dictionary of sentiment words and phrases with their associated orientations and strength. They are combined with intensification and negation to compute a sentiment score for each document [8]. Xiaowen Ding, Minqing Hu use sentence and aspect-level sentiment classification [10, 27, 28]. Yan et al. [29] propose a method called EXPRS (An Extended PageRank algorithm enhanced by a Synonym lexicon) to extract product features. To do so, they extract nouns/noun phrases first and then extract dependency relations between nouns/noun phrases and associated sentiment words. Dependency relations included subject-predicate relations, adjectival modifying relations, relative clause modifying relations, and verb-object relations. The list of product features was extended by using its synonyms. Non-features nouns are removed on the basis of proper nouns, brand names, verbal nouns and personal nouns. Peñalver-Martinez et al. [30] developed a methodology to perform aspect-based sentiment analysis of movie reviews. To extract the movie features from the reviews, they make a domain ontology (Movie Ontology). SentiWordNet is utilized to calculate the sentiment score. However, the critical issue here is how to construct such a sentiment lexicon, due to the cost of time and money to build such dictionaries.

Sentiment classification can be performed using machine learning approaches which often yield higher accuracy. Machine learning methods can be further divided into supervised and unsupervised ones. For supervised methods, two sets of annotated data, one for training and the other for testing are needed. Some of the commonly applied classifiers for supervised learning are Decision Tree (DT), SVM, Neural Network (NN), Naïve Bayes, and Maximum Entropy (ME). In paper Asha et al. [31], propose a Gini Index based feature selection method with Support Vector Machine (SVM) classifier for sentiment classification for large movie review data set. The Gini Index method for feature selection in sentiment analysis has improved the accuracy. Another research, Duc-Hong Pham and Anh-Cuong Le [32] design a multiple layer architecture of knowledge representation for representing the different sentiment levels for an input text. This representation is then integrated into a neural network to form a model for prediction of product overall ratings. These techniques, however, require a set of manually labeled data for training the model and thus could be costly.

Problem definition

A user review i on some product is assumed containing two parts: the review’s text denoted by di, and the review’s overall rating denoted by yi. Each review’s text di can contain multiple sentences. Furthermore, each sentence contains multiple words coming from the universal set of all possible worlds \({\mathcal{V}} = \{ {\text{w}}_{\text{k}} |{\text{ k}}{\mkern 1mu} = {\mkern 1mu} \overline{{1,{\text{P}}}} \}\), called a word dictionary.

It is assumed further that for a product, the set of all possible K aspects is already known together with topic words, called core terms that describe each aspect of the product.

Definition 1. Aspect

An aspect is a feature (an attribute or a component) of a product. For example, taste, aroma, and body are some possible aspects of the product “coffee”. We assume that there are K aspects mentioned in all reviews, denoted by \({\mathcal{A}} = \{ a_{j} |j\, = \,\overline{{1,{\text{K}}}} \}\). An aspect is represented by a set of words and denoted by \(a_{j} \, = \,\{ w|w \in {\mathcal{V}},A(w)\, = \,j\}\), where aj is the name of the aspect, w is a word from the set \({\mathcal{V}}\), and A(.) is a operator that maps a word to the aspect. For example, words such as “taste”, “aftertaste”, and “mouth feel” can characterize the taste aspect of the product coffee.

Definition 2. Aspect rating

Given a review i, a K-dimensional vector \({\mathbf{r}}_{\text{i}} \in {\mathbb{R}}^{\text{K}}\) is used to represent the rating of K aspects in the review’s text di, denoted by \({\mathbf{r}}_{\text{i}} \, = \,(r_{{i_{1} }} ,r_{{i_{2} }} , \ldots ,r_{{i_{K} }} )\), where \(r_{{i_{j} }}\) is a number indicating the user’s opinion assessment on aspect aj, and \(r_{{i_{j} }} \in [r_{ \text{min} } ,r_{ \text{max} } ]\) (e.g., the range of \(r_{{i_{j} }}\) can be from 1 to 5).

Definition 3. Aspect weight

Given a review i, a K-dimensional vector \(\varvec{\alpha}_{i} \in {\mathbb{R}}^{\text{K}}\) is used. The vector is denoted as \(\varvec{\alpha}_{\text{i}} \, = \,\left( {\alpha_{{i_{1} }} ,\alpha_{{i_{2} }} , \ldots ,\alpha_{{i_{K} }} } \right)\)) where \(\alpha_{{i_{j} }}\) is a number measuring the degree of importance of aspect aj posed by the user, \(\alpha_{{i_{j} }}\) [0, 1], and \(\sum\nolimits_{j = 1}^{K} {\alpha_{{i_{j} }} } = 1\). A higher weight means more emphasis is put on the corresponding aspect.

Definition 4. Aspect core terms

Given an aspect aj, the set of associated core terms for aj is denoted by \({\mathcal{C}}_{\text{j}} \, = \left\{ {w_{j1} , \, w_{j2} , \ldots ,w_{jN} } \right\}\) where wjk is a word that describes the aspect aj. The core terms can be provided by the user or by some field experts.

Major notations used throughout the paper are given in Table 1.

Table 1 Notations used in this paper

Extracting aspect

The goal of this task is to extract aspects mentioned in a review. It is assumed that each aspect is a probability distribution over words. It is also assumed that each sentence in a review’s text can mention more than one aspect. Therefore, our method to extract aspects is based on conditional probability of words such that each sentence can be assigned with multiple labels.

Inferring aspect rate

This task is to infer the vector ri of aspect ratings (defined in Definition 2) given a review di. Rating of an aspect reflects the user’s sentiment on the aspect which is often expressed in positive or negative words. The more positive words the user use, the higher rating he/she want to pose on the aspect. This research adopts a supervised learning method, the Naive Bayes method, to learn the aspect ratings in which sentiment words are considered as features.

Estimating aspect weight

This task is to estimate non-negative weights αi that a user places on aspect \(a_{ij}\) of review i. Weight of an aspect essentially measures the degree of importance posed by the user on the aspect. It is observed that people often talk more on aspects that they are interested in a same review. Besides, the idea that an aspect is important is often shared by many other people. Based on these observations, a formula is devised to calculate aspect weight. The formula takes into account the occurrences of words discussing the aspect within a review and the frequency of text sentences discussing the same aspect across all reviews.

Method

Extracting aspect

The goal of this task is to assign a subset of aspect labels from the universal set of all aspect labels of a product to every sentence in a review. Aspect label is determined based on the set of relevant words called aspect words or terms. Each aspect in the universal label set is provided with some initial core terms. The main challenge here is that many reviews contain very few core terms or even do not contain any term at all. This results in incorrect labels being assigned to sentences. Therefore, it is required to expand the core terms to a richer set of aspect words based on the given data (the reviews). In some existing methods, the set of aspect words is built based on Bayes or Hidden Markov Model. Our method use conditional probabilistic model [33] combined with the Bootstrap technique to generate aspect words. Figure 3 illustrates four aspects of a coffee product represented by their corresponding aspect words, in which the symbol O represents core terms, the symbol X represents words appearing in the corpus. For this coffee product four aspects body, taste, aroma, and acidity are already known. The sets of core terms corresponding to these aspects are {body}, {taste, aftertaste, finishing, mouthfeel}, {aroma, smell, flavor} and {acid, acidity}, respectively. Core terms are then enlarged by inserting words that have high probability to appear in the same sentences that they occur. Sets of aspect words are represented by the four circles. These circles may overlap, indicating that some aspect words may belong to different aspects.

Fig. 3
figure3

Core terms with aspects

Suppose that \({\mathcal{A}} = \left\{ {a_{1} , \, a_{2} , \ldots ,a_{K} } \right\}\) is the set of K aspects. For each \(a_{j}\), a set of words that appear in sentences labeled with aspect aj such that their occurrences exceed a given threshold is obtained. The set of words of two aspects can overlap, such that some terms may belong to multiple aspects. First, sentences that contain at least one word in the original core terms of the aspect are located. Then, all words including nouns, noun phrases, adjectives, and adverbs that appeared in these sentences are found. Words that occur more than a given threshold θ are inserted to the set of aspect words. Words with maximum number of occurrences in the set of new-found aspect words are added to the set of core terms. The new set of aspect words with core terms excluded is used to find new sentences. The above-mentioned process is repeated until no more new words are found.

The procedure for updating aspect words for an aspect \(a_{j}\) is given below.

figurea

A bootstrapping algorithm to assign labels to sentences in the reviews is given below.

figureb

The proposed Aspect Extraction Algorithm works as follows. First all reviews’ texts are split into sentences (step 2). Then, aspect labels from the set \({\mathcal{A}}\) of all labels are assigned to every sentence of the set \({\mathcal{D}}\) of reviews’ text based on the initial aspect core terms (step 3). Based on this initial aspect labeling, the set of aspect core terms and the set of aspect words for every aspect are updated (step 4). The labels for all sentences are updated using the new core terms and the aspect words sets (step 5). Step 4 and step 5 are repeated until no more new aspect word set are found or the number of iterations exceeds a given threshold.

Inferring aspect rating and estimating aspect weight

Aspect ratings often reflect the user’s satisfaction on aspects of a product. Meanwhile, aspect weights measure the degree of importance of the aspects posed by the user. Given the overall rating on a product, it is assumed that the overall rating is the weighted sum of rating on multiple aspects of the product. Following this assumption, some regression-based methods [16, 17, 34] have been proposed to estimate the two parameters by solving the following equation:

$$y_{i} = \mathop \sum \limits_{j = 1}^{K} r_{ij} \alpha_{ij}$$
(1)

where rij and αij are the rating and the weight of k-th aspect of the review i, respectively.

There are linear regression methods [35] which estimate only the aspect weight and require that the aspect ratings are available. Some other methods [17, 34] estimate both aspect’s rating and weight at the same time. The key point of these methods is to use sentiment words, more specifically the polarity of sentiment words, to calculate ratings and weights. Even though sentiment words can usually correctly reflect the user’s rating for each aspect, they do not always reflect the user’s opinion about an aspect’s weight.

Aspect rating and aspect weight of an aspect are estimated separately. An important point in our method is that aspect rating and aspect weight are calculated based on the review content only, without the requirement of knowing the user’s overall rating. However, in “Results and discussion” section, Eq. (1) is still used to test our method. It is shown experimentally that our results conform well to the assumption that the overall rating is the weighted sum of rating on multiple aspects.

The aspect rating problem is treated as the problem of multi-label classification, in which ratings (from 1 to 5) as considered as labels, and sentiment words are used as features. In most sentiment analysis work, adjectives and adverbs are used as candidate sentiment words. Adjectives and adverbs are detected based on the well-known Part of Speech technique (POS). It is recognized that some phrases can also be used to express sentiments depending on different contexts. For example, in the following two sentences “we have big problem with staff”, and “we have a big room”, the two noun phrases “big problem” and “big room” convey opposite sentiments, negative vs. positive, while both phrases contain the same adjective “big”. Some fixed syntactic patterns in [9] as phrases of sentiment word features are used. Only fixed patterns of two consecutive words in which one word is an adjective or an adverb and the other provides a context are considered.

Two consecutive words are extracted if their POS tags conform to any of the rules in Table 2 in which JJ tags are adjectives, NN tags are nouns, RB tags are adverbs, and VB tags are verbs. For example, rule 2 in this table means that two consecutive words are extracted if the first word is an adverb, the second word is an adjective, and the third word (which is not extracted) is not a noun. As an example, in the sentence “Quite dry, with a good grassy note”, two patterns “quite dry” and “good grassy” are extracted as they satisfy the second and the third rules, respectively. Then, conditional probability of word features in the corpus is determined. Label (scoring) for each aspect is predicted based on Naïve Bayes method.

Table 2 POS labeled rules [9]

Given a review’s text di, the rating of an aspect aj with q extracted features is inferred based on the probability \(r_{{i_{j} }}\) that the rating label belongs to class c  C = {1, 2, 3, 4, 5}. The probability is as:

$$P\left( {r_{{i_{j} }} \in c|f_{1} , \ldots , f_{q} } \right) = \frac{{P\left( {f_{1} , \ldots , f_{q} |r_{{i_{j} }} \in c} \right)P\left( {r_{{i_{j} }} \in c} \right)}}{{P\left( {F_{1} , \ldots , F_{q} } \right)}}$$
(2)

It is assumed that the features are independent, then (2) is transformed into:

$$P\left( {r_{{i_{j} }} \in c|f_{1} , \ldots ,f_{q} } \right) = \frac{{\mathop \prod \nolimits_{k = 1}^{q} P(f_{k} |r_{{i_{j} }} \in c)P\left( {r_{{i_{j} }} \in c} \right)}}{{\mathop \sum \nolimits_{k = 1}^{q} P\left( {f_{k} } \right)}}$$
(3)

in which: \(P\left( {f_{k} |r_{{i_{j} }} \in c} \right) = n_{aj} \left( {f_{k} ,c} \right)/n_{aj} \left( c \right)\) is the probability that feature fk belongs to the class c, naj(fk, c) is the number of sentences labeled as c of the aspect aj which contains the feature fk, and naj(c) is the number of all sentences containing the aspect aj and has class label c,

\(P(r_{{i_{j} }} \in c)\)= naj(c)/naj is the probability that the rating \(r_{{i_{j} }}\) belongs to the class c, naj(c) is the number of sentences labeled as c of aspect aj, and naj is the number of all sentences containing the aspect aj,

P(fk) is the probability of feature fk.

For smoothing (3), Laplace transformation is used. We get:

$$P\left( {f_{k} |r_{{i_{j} }} \in c} \right) = \frac{{n_{aj} \left( {f_{j} , c} \right) + 1}}{{n_{aj} \left( c \right) + \left| V \right| + 1}}$$
(4)

in which, |V| is number of word features regarding the aspect aj.

The rating \(r_{ij}\) is the label c that maximize \(P(r_{ij} \in c|f_{1} , \ldots , f_{q} )\).

$$\hat{c} = argmax_{c \in C} \mathop \prod \limits_{k = 1}^{q} P(f_{k} |r_{{i_{j} }} \in c)P\left( {r_{{i_{j} }} \in c} \right).$$

Now the method to estimate aspect weight is given. By doing research carefully throughout the reviews, it can be seen that if a user care more about an aspect (showing that the aspect is important to the user), he/she will mention more about it in the review. Moreover, the idea that an aspect is important is often shared by many other users. Following this observation, we estimate aspect weights by calculating two components: the weight measure of aspect aj within the reviews’ text di, denoted by EDij, and the weight measure of the aspect across all reviews, denoted by ECj. Note that in this way, the polarity measures of sentiment words are not used as in some other approaches. Instead, probability measures of words and sentences regarding an aspect in the review and the corpus are considered. This idea is similar to the idea of using tf/idf for measuring word importance to some extent.

Given a review i, the weight component of the aspect aj, EDij, is calculated as:

$$ED_{ij} = \frac{{\mathop \sum \nolimits_{k = 1}^{{N_{i} }} w_{ijk} }}{{N_{i} }}.$$
(5)

In which: \(w_{ijk}\) is the k-th word in the aspect words of aspect \(a_{j}\), and \(N_{i}\) is the number of aspect words that occur in the review’s text di for all aspects.

The weight component ECj is calculated as:

$$EC_{j} = \frac{{\mathop \sum \nolimits_{k = 1}^{M} s_{jk} }}{M}$$
(6)

In which: \(s_{jk}\) is the k-th sentence in the corpus labeled by the aspect \(a_{j}\), and M is the number of all sentences in the corpus.

Finally, the weight \(\alpha_{{i_{j} }}\) for an aspect aj of review i is calculated as:

$$\alpha_{{i_{j} }} = \frac{{ED_{ij} *EC_{j} }}{{\mathop \sum \nolimits_{j = 1}^{K} ED_{ij} *EC_{j} }}$$
(7)

The denominator \(\sum\nolimits_{j = 1}^{K} {ED_{ij} EC_{j} }\) is to normalize the value of αij to the range [0,1].

Results and discussion

In this section, experiments to evaluate the proposed methods are conducted.

Data set

The experiments are carried out using three different data sets including a data set for hotel review collected from Tripadvisor.com [17], one data set for beer review used in [24], and a data set for Trung Nguyen coffee review collected by our self from the Amazon web site.

The Hotel data set contains seven different aspects that are room, location, cleanliness, check-in/front desk, service and business services. The beer data set has five distinct aspects that are aroma (or smell), palate (or feel), taste, appearance (or look), and overall. This data set is quite big with millions of reviews. A subset of 50,000 beer reviews is used in the experiment. The coffee data set contains 1200 reviews belongs to 17 different kinds of coffee. Table 3 gives some statistics of the three data sets.

Table 3 Summary of the Data Set

Inferring aspect rating task

Note that each review may be assigned with different labels. This means that sentence level, not review level is considered. Testing sets of 2500, 2000, and 500 sentences are selected randomly from the hotel data set, beer data set, and coffee data set, respectively. The rest of sentences are used as the training sets.

Table 4 gives initial core terms for the three data sets.

Table 4 Seed word for main aspects

The precision measure is used to evaluate the experimental results:

$$P = \frac{{\left| {extrating\,Aspect \cap True\,Aspect} \right|}}{{\left| {extracting\,Aspect} \right|}}$$
(8)

Table 5 shows the performance of our method on three data sets for the aspect extraction task. Our method yields up to average precision of 0.786, 0.803 and 0.653 for hotel data set, beer data set and coffee data set, respectively. Our method obtains good performance on the hotel and beer data set. However, for the coffee data set, the result is not as good as expected. This is because in the coffee data set, users often give only general view about a product, and moreover, the data set contains mostly very short reviews, with average number of sentences of 4.5, compared to 10 and 9 of the hotel data set and the beer data set.

Table 5 Aspect Identification results

Our method is compared with other works. First, our method is compared with the frequency-based method in [14] on the hotel dataset. Figure 4 shows that our method outperforms Long’s in room (R), service (S), and cleanliness (C) aspects. But Long’s method outperforms us in detecting the value (V) aspect.

Fig. 4
figure4

The results of our method and Long et al. method

Our method is compared with two topic modeling-based methods in [22] and in [24] on the beer data set. The method in [22] is a semi-supervised method, called LDA. In [24], the authors give 3 different methods, namely, unsupervised, semi-supervised, and fully supervised methods. As our method can be considered as a semi-supervised method, it is compared with PALE LAGER, a semi-supervised method, and with PALE LAGER, a supervised method given in [24].

The results in Fig. 5 shows that our method outperforms LDA with a large margin, and slightly outperforms PALE LAGER (a semi-supervised method) and PALE LAGER (a supervised method).

Fig. 5
figure5

The results of our method and LDA, PALE LAGER

We then search for the best threshold θ at which our method performs the best. The results are shown in Fig. 6, where the threshold θ of about 0.15 is the best one.

Fig. 6
figure6

Aspect evaluation with θ

Aspect ranking prediction

Unlike the evaluation of the aspect extraction task that is done based on the sentence level, in this task, the result based on the review level is evaluated.

The mean square error measure (named \(\Delta_{aspect}^{2}\)) is used for evaluating methods of mining aspect rating.

$$\Delta_{aspect}^{2} = \frac{{\mathop \sum \nolimits_{i = 1}^{Q} \mathop \sum \nolimits_{j = 1}^{K} \left( {r_{{i_{j} }} - r_{{i_{j} }}^{*} } \right)^{2} }}{Q\times{K}}$$
(9)

where K is the number of aspects, Q is the number of reviews, and \(r_{{i_{j} }}^{*}\) is the true ratings for aspect aj within review’s text di.

To evaluate how well the predicted aspect ratings can preserve their relative order within a review given the true ratings, the aspect correlation measure (named \(\rho_{aspect}\)) is used:

$$\rho_{aspect} = \frac{{\mathop \sum \nolimits_{i = 1}^{Q} \rho_{{\varvec{r}_{i} ,\varvec{r}_{i}^{*} }} }}{Q}$$
(10)

where Q is the number of reviews, and \(\rho_{{\varvec{r}_{i} ,\varvec{r}_{i}^{*} }}\) is the Pearson correlation between two vectors \(\varvec{r}_{i}\) and \(\varvec{r}_{i}^{*}\) of the inferred and the true ratings, respectively.

The two measures above are for evaluating the results for each review. The results on the whole set of reviews are evaluated by using the so called aspect correlation across reviews measure (\(\rho_{review}\)):

$$\rho_{review} = \frac{{\mathop \sum \nolimits_{j = 1}^{K} \rho \left( {\overrightarrow {{r_{j} }} ,\overrightarrow {{r_{j}^{*} }} } \right)}}{K}$$
(11)

where \(\rho \left( {\overrightarrow {{r_{j} }} ,\overrightarrow {{r_{j}^{*} }} } \right)\) is the Pearson correlation between two vectors \(\overrightarrow {{r_{j} }}\) and \(\overrightarrow {{r_{j}^{*} }}\) of the inferred and rating.

Our method is also compared with Long’s [14] and Wang’s [17]. Long proposed two methods based on the SVM classifier and the Bayesian Network classifier. Wang’s method is called Latent Rating Regression (LRR) which infers aspect ratings and aspect weights simultaneously.

The performance results are shown in Table 6. Our method performs much better than Long’s method and Wang’s method on all three measures.

Table 6 Comparison with other models for referring aspect ratings

Estimating aspect weight

For evaluating the correctness of estimated weights by our method, the overall rating is calculated and compared with the true overall rating given by the user. The estimated overall rating is given by the following formula:

$$\hat{y}_{i} = \mathop \sum \limits_{j = 1}^{K} r_{{i_{j} }} \alpha_{{i_{j} }}$$
(12)

where \(r_{{i_{j} }}\) is the rating of the j-th aspect of the review i and \(\alpha_{{i_{j} }}\) is the estimated weight.

Our method is compared with Wang’s method [17] based on the \(\Delta_{overall rating}^{2}\). Table 7 presents the mean square errors of overall rating for the three data sets. As can be seen in the table, our results are comparable to Wang’s.

Table 7 MSE of overall rating prediction

Conclusion

This paper dealed with three important sub-tasks of the opinion mining problem, that are (1) extracting aspects mentioned in the reviews of a product by using conditional probability of words, (2) inferring the user’s rating for each identified aspect based on Naïve Bayes classifier, (3) estimating the weight placed on each aspect by the users by using the occurrences of word that discuss the aspect within a review and the frequency of text sentences that discuss the same aspect across all reviews.

Our method does not require to know the overall ratings and is as not complicated as some other previous methods. However, it still works very well on real world datasets in comparison with other state of the art methods.

In the future, the problem of aspect mining from unlabeled data will be considered. In addition, the proposed model will be applied to other domains such as movie, digital camera businesses to validate its generalized effectiveness.

Abbreviations

pLSA:

Probabilistic Latent Semantic Analysis

LDA:

Latent Dirichlet allocation

HMM:

Hidden Markov Model

CRF:

Conditional Random Field

References

  1. 1.

    Park S, Lee K, Song J. Contrasting opposing views of news articles on contentious issues. In: Proceedings of the 49th annual meeting of the association for computational linguistics (ACL-2011). 2011.

  2. 2.

    van den Camp M, van den Bosch A. The socialist network. Decis Support Syst. 2012;53:761–9.

  3. 3.

    Li SK, Guan Z, Tang LY, et al. Exploiting consumer reviews for product feature ranking. J Comput Sci Technol. 2012;27(3):635–49. https://doi.org/10.1007/s11390-012-1250-z.

  4. 4.

    Lin C, He Y, Everson R, Ruger S. Weakly supervised joint sentiment-topic detection from text. IEEE Trans Knowl Data Eng. 2012;24(6):1134–45.

  5. 5.

    Zhan J, Loh HT, Liu Y. Gather customer concerns from online product reviews—a text summarization approach. Expert Syst Appl. 2009;36:2107–15.

  6. 6.

    Dang Y, Zhang Y, Chen H. A lexicon-enhanced method for sentiment classification: an experiment on online product reviews. IEEE Intell Syst. 2010;25(4):46–53.

  7. 7.

    Pang B, Lee L. A sentiment education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for Computational Linguistics. 2004. p. 271.

  8. 8.

    Taboada M, Brooke J, Tofiloski M, Voll K, Stede M. Lexicon-based methods for sentiment analysis. Comput Linguistics. 2011;37(2):267–307.

  9. 9.

    Turney PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In: ACL ‘02 Proceedings of the 40th annual meeting on association for computational linguistics. p. 417–24.

  10. 10.

    Hu M, Liu B. Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD international conference on knowledge discovery and data mining, KDD’04, New York: ACM; 2004, p. 168–77.

  11. 11.

    Liu B. Sentiment analysis and opinion mining. Synth Lect Human Lang Technol. 2012;5(1):1–67.

  12. 12.

    Popescu AM, Etzioni O. Extracting product features and opinions from reviews. In: HLT ‘05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. p. 339–46.

  13. 13.

    Zhu J, Wang H, Tsou BK, Zhu M. Multi-aspect opinion polling from textual reviews. In: Proceedings of ACM international conference on information and knowledge management (CIKM-2009). 2009.

  14. 14.

    Long C, Zhang J, Zhut X. A review selection approach for accurate feature rating estimation. In: Proceedings of Coling 2010: Poster volume. 2010.

  15. 15.

    Moghaddam S, Ester M. Opinion digger: an unsupervised opinion miner from unstructured product reviews. In: Proceeding of the ACM conference on Information and knowledge management (CIKM-2010). 2010.

  16. 16.

    Chen Li, Wang Feng. Preference-based clustering reviews for augmenting e-commerce recommendation. Knowl Based Syst. 2013;50:44–59.

  17. 17.

    Wang H, Lu Y, Zhai C. Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’10, New York: ACM; 2010. p. 783–92.

  18. 18.

    Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst. 2015;89:14–46.

  19. 19.

    Santorini B. Part-of-speech tagging guidelines for the Penn Treebank Project, University of Pennsylvania, School of Engineering and Applied Science, Dept. of Computer and Information Science. 1990.

  20. 20.

    Cilibrasi RL, Vitanyi PMB. The google similarity distance on Knowledge and Data Engineering, IEEE transactions. 2007; 370–83.

  21. 21.

    Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information SIGIR’99. New York: ACM; 1999. p. 50–7.

  22. 22.

    Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993–1022.

  23. 23.

    Brody S, Elhadad N. An unsupervised aspect-sentiment model for online reviews. In: Human language technologies: the annual conference of the north american chapter of the association for computational linguistics, HLT’10, Stroudsburg; 2010. p. 804–12.

  24. 24.

    McAuley J, Leskovec J, Jurafsky D. Learning attitudes and attributes from multi-aspect review. In: International conference on data mining (ICDM). 2012.

  25. 25.

    Sauper C, Barzilay R. Auto-matic aggregation by joint modeling of aspects and values. J Artif Int Res. 2013;46(1):89–127.

  26. 26.

    Li H, Lin R, Hong R, Ge Y. Generative models for mining latent aspects and their ratings from short reviews. In: 2015 IEEE international conference on data mining. p. 241–50.

  27. 27.

    Ding X, Liu B, Yu PS. A holistic lexicon-based approach to opinion mining. In: Proceedings of the conference on web search and web data mining (WSDM-2008). 2008.

  28. 28.

    Kim SM, Hovy E. Determining the sentiment of opinions. In: Proceedings of international conference on computational linguistics (COLING-2004).

  29. 29.

    Yan Z, Xing M, Zhang D, Ma B. EXPRS: an extended pagerank method for product feature extraction from online consumer reviews, Inf. Manage. 2015.

  30. 30.

    Penalver-Martinez I, Garcia-Sanchez F, Valencia-Garcia R, Rodriguez-Garcia MA, Moreno V, Fraga A, Sanchez-Cervantes JL. Feature-based opinion mining through ontologies. Expert Syst Appl. 2014;41(13):5995–6008.

  31. 31.

    Manek AS, Shenoy PD, Mohan MC, Venugopal KR. Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web. 2017;20(2):135–54.

  32. 32.

    Pham DH, Le AC. Learning multiple layers of knowledge representation for aspect based sentiment analysis. Data Knowl Eng. 2017;114:26–39.

  33. 33.

    Dao TT, Thanh TD, Hai TN, Ngoc VH. Building Vietnamese topic modeling based on core terms and applying in text classification. In: Proc. of the fifth IEEE international conference on communication systems and network technologies. 2015. P. 1284–88.

  34. 34.

    Yu J, Zha ZJ, Wang M, Chua TS. Aspect ranking: identifying important product aspects from online consumer reviews. In: Proceedings of the 49th Annual meeting of the association for computational linguistics: human language technologies. Volume 1, HLT’11, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics. p. 1496–505.

  35. 35.

    Archak N, Ghose A, Ipeirotis PG. Show me the money!: Deriving the pricing power of product features by mining consumer reviews. In: Proceedings of the 13th ACM SIGKDD in-ternational conference on Knowledge discovery and data min-ing, KDD’07, New York: ACM; 2007. p. 56–65.

Download references

Authors’ contributions

TNTN proposed method and performed experiments, HNTT supervised the programming and wrote draft manuscript. VAN wrote a part of the manuscript and corrected after received reviews. All authors read and approved the final manuscript.

Acknowledgements

This research is funded by the project “Building a System for Prediction and Management of Information Spreading in Social Networks in Vietnam” under Grant VAST01.01/17-18

Competing interests

Data mining, big data, machine learning and natural language processing.

Funding

Not applicable.

Availability of data and materials

All data used in this study are publicly available and accessible in the source Tripadvisor.com.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Correspondence to Tu Nguyen Thi Ngoc.

Appendix

Appendix

See Tables 8, 9, 10.

Table 8 Aspect word set of Hotel data
Table 9 Aspect word set of Beer data
Table 10 Aspect word set of Coffee data

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Keywords

  • Aspect extraction
  • Aspect rating
  • Aspect weight
  • Conditional probability
  • Core term
  • Naive Bayes