Text ranking models based on BERT are now well established for a wide range of pas-
sage and document ranking tasks. However, the robustness of BERT-based ranking
models under adversarial attack is under-explored. In this work, we argue that BERT-
rankers are vulnerable to adversarial attacks targeting retrieved documents given a
query.
We propose algorithms for generating adversarial perturbation of documents locally
to individual queries or globally across the dataset using gradient-based optimization
methods. The aim of our algorithms is to add a small number of tokens to a highly
relevant or non-relevant document to cause a significant rank demotion or promotion.
Our experiments show that a few number of tokens can already change the document
rank by a large margin. Besides, we find that BERT-rankers heavily rely on the docu-
ment start/head for relevance prediction, making the initial part of the document more
susceptible to adversarial attacks.
More interestingly, our statistical analysis finds a small set of recurring adversar-
ial tokens that when concatenated to documents result in successful rank demo-
tion/promotion of any relevant/non-relevant document respectively. Finally, our ad-
versarial tokens also show particular topic preferences within and across datasets,
exposing potential biases from BERT pre-training or downstream datasets.
|