Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis


While online review services provide a two-way conversation between brands and consumers, malicious actors, including misbehaving businesses, have an equal opportunity to distort the reviews for their own gains. We propose OneReview, a method for locating fraudulent reviews, correlating data from multiple crowd-sourced review sites. Our approach utilizes Change Point Analysis to locate points at which a business' reputation shifts. Inconsistent trends in reviews of the same businesses across multiple websites are used to identify suspicious reviews. We then extract an extensive set of textual and contextual features from these suspicious reviews and employ supervised machine learning to detect fraudulent reviews. We evaluated OneReview on about 805K and 462K reviews from Yelp and TripAdvisor, respectively to identify fraud on Yelp. Supervised machine learning yields excellent results, with 97% accuracy. We applied the created model on suspicious reviews and detected about 62K fraudulent reviews (about 8% of all the Yelp reviews). We further analyzed the detected fraudulent reviews and their authors, and located several spam campaigns in the wild, including campaigns against specific businesses, as well as campaigns consisting of several hundreds of socially-networked untrustworthy accounts.

In The World Wide Web Conference 2019 (WWW ‘19)

Cite it:

  title={Think Outside the Dataset: Finding Fraudulent Reviews using Cross-Dataset Analysis},
  author={Nilizadeh, Shirin and Aghakhani, Hojjat and Gustafson, Eric and Kruegel, Christopher and Vigna, Giovanni},
  booktitle={The World Wide Web Conference},