Author:
(1) David Novoa-Paradela, Universidade da Coruña, CITIC, Campus de Elviña s/n, 15008, A Coruña, Spain & Corresponding author (Email: [email protected]);
(2) Oscar Fontenla-Romero, Universidade da Coruña, CITIC, Campus de Elviña s/n, 15008, A Coruña, Spain (Email: [email protected]);
(3) Bertha Guijarro-Berdiñas, Universidade da Coruña, CITIC, Campus de Elviña s/n, 15008, A Coruña, Spain (Email: [email protected]).
This paper presents a pipeline to detect and explain anomalous reviews in online platforms. The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition. The classifications are accompanied by a normality score and an explanation that justifies the decision made. The pipeline’s ability to solve the anomaly detection task was evaluated using different datasets created from a large Amazon database. Additionally, a study comparing three explainability techniques involving 241 participants was conducted to assess the explainability module. The study aimed to measure the impact of explanations on the respondents’ ability to reproduce the classification model and their perceived usefulness. This work can be useful to automate tasks in review online platforms, such as those for electronic commerce, and offers inspiration for addressing similar problems in the field of anomaly detection in textual data. We also consider it interesting to have carried out a human evaluation of the capacity of different explainability techniques in a real and infrequent scenario such as the detection of anomalous reviews, as well as to reflect on whether it is possible to explain tasks as humanly subjective as this one.
Keywords: Anomaly Detection, Text Reviews, Transformers, Explainability
Nowadays more than ever in history, the user opinions about products and services have a great impact on the future of the company that offers them. In such a globalized and highly competitive world, online review platforms, such as electronic commerce (e-commerce), play a crucial role in the credibility of products and services. These user reviews usually come in the form of text reviews or numerical ratings, accompanied in some cases by images or videos. These reviews provide users with information about the product or service they are considering purchasing, which directly influences the number of sales. Most people make purchase decisions based on ratings and reviews from other users [1].
In the case of many companies, such as Amazon, each product in the store has a list of text reviews published by customers of the platform. Users can access this list of reviews (opinions) to obtain extra information about the product, being able to mark the reviews as useful, which will position those reviews with the most votes at the top of the list. In addition to this, users can report to Amazon if they feel a review is inappropriate, for example, if its content is incorrect. This procedure to rank reviews based on their usefulness and report inappropriate reviews is carried out manually by platform users. As a result, Amazon reported more than 200 million suspected fake reviews in 2020 alone [2]. This problem does not only occur on Amazon, but affects all platforms that allow their users to post reviews. For example, Tripadvisor uses an automatic system capable of distinguishing between normal, suspicious and inappropriate reviews [3]. Inappropriate ones are automatically removed (3.1% of review submissions in 2020), while those classified as suspicious are reviewed again by a human moderator (5.1% of review submissions in 2020).
On the other hand, in machine learning (ML), anomaly detection (AD) is the branch that builds models capable of differentiating between normal and abnormal data [4]. At first, anomaly detection might seem like a classification problem with only two classes. However, anomalies tend to occur infrequently or are non-existent, so normal data prevails in these scenarios. Because of this, it is common for models to be trained only with normal data. The goal of these models is to represent the normal class as well as possible in order to classify new data as normal or anomalous.
The technological development of recent years has allowed the construction of very powerful models for Natural Language Processing (NLP) [5].
Contrary to other tasks such as Sentiment Analysis [6] or Question Answering [7], the application of anomaly detection on texts is still at an early stage, probably due to its lower demand. For this reason, we present in this article a pipeline that, given text reviews of a product (in this case from Amazon), addresses opinion filtering as an anomaly detection problem where:
• Reviews containing representative information about the product are considered as the normal class.
• Reviews whose content has nothing to do with the product to be represented are considered as the anomalous class.
The proposed pipeline allows us to carry out the following tasks:
• Classify reviews as normal or anomalous, allowing us to locate those that do not describe characteristics of the product to which they are associated, and therefore have no value for the users of the platform.
• Issue a normality score associated with each review.
• Generate an explanation that justifies the classification made for each review by the system.
The pipeline’s ability to solve the anomaly detection task was evaluated using different datasets created from a large Amazon database [8]. In addition, to evaluate the explainability module, a study was carried out to compare three explainability techniques in which a total of 241 people had participated. The objective of this study is both to measure the impact of the explanations on the reproducibility of the classification model by the respondents, as well as the usefulness of these explanations.
We believe that this work can be useful to automate tasks such as those mentioned in online review platforms, in addition to the existing interest in the application of anomaly detection models on texts, for which it can serve as inspiration to solve similar problems. We also consider it interesting to have carried out a human evaluation of the capacity of different explainability techniques in a real and infrequent scenario such as the detection of anomalous reviews, as well as to reflect on whether it is possible to explain tasks as humanly subjective as this one.
This document is structured as follows. Section 2 contains a brief review of the main anomaly detection works in texts and, more specifically, in text reviews. Section 3 describes the proposed pipeline and its operation. Section 4 collects the experimentation carried out and, finally, conclusions are drawn in Section 5.