Survey of review spam detection using machine learning techniques

Journal of Big Data

Table 9 Comparison of previous works and results for review spam detection along with the relative complexity of the approach (including feature extraction and learning methodology)

Paper	Dataset	Features used	Learner	Performance metric	Score	Method complexity
[20]	5.8 million reviews written by 2.14 reviewers crawled from amazon website	Review and reviewer features	LR	AUC	78 %	Low
[21]	5.8 million reviews written by 2.14 reviewers crawled from amazon website	Features of the review, reviewer and product characteristics	LR	AUC	78 %	Medium
[21]	5.8 million reviews written by 2.14 reviewers crawled from amazon website	Text features	LR	AUC	63 %	Low
[9]	6000 reviews from Epinions	Review and reviewer features	NB with Co-training	F-Score	0.631	High
[3]	Hotels through Amazon Mechanical Turk (AMT) by Ott et al.	Bigrams	SVM	Accuracy	89.6 %	Low
[3]	Hotels through Amazon Mechanical Turk (AMT) by Ott et al.	LIWC + Bigrams	SVM	Accuracy	89.8 %	Medium
[25]	Hotels through Amazon Mechanical Turk (AMT) by Ott et al. + gathered 400 deceptive hotel and doctor reviews from domain experts	LIWC + POS + Unigram	SAGE	Accuracy	65 %	High
[23]	Yelp’s real-life data	Behavioral features combined with the bigram features	SVM	Accuracy	86.1 %	Medium
[11]	Hotels through Amazon Mechanical Turk (AMT) by Ott et al.	Stylometric features	SVM	F-measure	84 %	Low
[12]	Hotels through Amazon Mechanical Turk (AMT) by Ott et al.	n-gram features	SVM	Accuracy	86 %	Low
[1]	Dataset collected from amazon.com	Syntactical, lexical, and stylistic features	SLM	AUC	.9986	High
[24]	Their own crawled Arabic reviews from tripadvisor.com, booking.com, and agoda.ae	Review and reviewer features	NB	F-measure	.9959	Low