In recent years, deep learning models have become very powerful – even outperforming
humans on a variety of tasks. This enables more real-world applications, including also
sensitive fields such as medical diagnoses or jurisdiction. Besides achieving sufficiently
good performance, the requirement to justify and explain the models’ decisions is becoming
increasingly important.
This work aims to enable a broader application of a specific model class that is inherently
interpretable, namely explain-then-predict models, by reducing the annotation cost of the
explanations. We focus on the ExPred model as a representative of explain-then-predict
models.
We investigate its dependency on rationale annotations, a special kind of explanation,
through training using gradually fewer rationale-labeled instances. Furthermore, we ex-
plore different approaches that aim to reduce the number of human-labeled instances
required during training, such as active learning and weak supervision.
Our results show that even with only a fraction of instances annotated with rationales
from the original dataset, ExPred still achieves good performance (within 95% of the
performance when using 100% annotation). Depending on the dataset, only a few thousand
annotated rationales are required. Using weak supervision, this can be further reduced, at
least in specific settings. On the Movie Reviews dataset, we achieve good performance with
only 5% of the original rational labels. The tested off-the-shelf active learning methods do
not provide any benefit over randomly selecting instances to label. However, the extensive
behavior analysis enables the future design of active learning methods that are tailored
to explain-then-predict models. We start by proposing an active learning method that
outperforms the random baseline on the Movie Reviews dataset.
|