Interpreting Text Classification with Human-Understandable Counterfactual Instances

Zur Kurzanzeige

dc.identifier.uri http://dx.doi.org/10.15488/11892 eng
dc.identifier.uri https://www.repo.uni-hannover.de/handle/123456789/11987
dc.contributor.advisor Anand, Avishek
dc.contributor.advisor Lindauer, Marius
dc.contributor.author Li, Teng eng
dc.date.accessioned 2022-03-18T14:22:21Z
dc.date.available 2022-03-18T14:22:21Z
dc.date.issued 2022
dc.identifier.citation Li, Teng: Interpreting Text Classification with Human-Understandable Counterfactual Instances. Hannover : Gottfried Wilhelm Leibniz Universität, Master Thesis, 2022, 26 S. DOI: http://doi.org/10.15488/11892 eng
dc.description.abstract As the omnipresent machine learning models play increasingly important roles in our society, powerful interpretation tools to uncover their black boxes are needed. On the other hand, proven by psychological study, we humans are more likely to learn new concepts presented with contrastive instances. Therefore, interpreting ML models using the contrast between the original data instance and its counterfactuals has become a popular problem. Traditional counterfactual interpretation approaches tend to generate counterfactuals faithful to the ML model. However, they have little or no constraint on the meaningfulness of generated counterfactuals. This thesis proposes an approach generating a meaningful counterfactual interpretation of text classification models constrained with cosine similarity and POS (part-of-speech) properties of tokens. In this thesis, I use the text CNN model based on Kims Cnn\cite{KimsCnn} with fine-tuned Word2Vec embedding layer as the model to interpret. Then for the counterfactual generation, I leverage token-level HotFlip\cite{hotflip} and replace tokens under several constraints. Lastly, I will present that my approach results in more meaningful counterfactual interpretations compared with the vanilla HotFlip approaches using several examples. eng
dc.language.iso eng eng
dc.publisher Hannover : Gottfried Wilhelm Leibniz Universität Hannover
dc.rights CC BY 3.0 DE eng
dc.rights.uri http://creativecommons.org/licenses/by/3.0/de/ eng
dc.subject Artificial Inteligence eng
dc.subject Interpretability eng
dc.subject Machine Learning eng
dc.subject Natural Language Processing eng
dc.subject AI eng
dc.subject NLP eng
dc.subject Künstliche Intelligenz, Interpretierbarkeit, Maschinelles Lernen, Verarbeitung natürlicher Sprache ger
dc.subject.ddc 500 | Naturwissenschaften eng
dc.title Interpreting Text Classification with Human-Understandable Counterfactual Instances eng
dc.type MasterThesis eng
dc.type Text eng
dcterms.extent 26 S.
dc.description.version publishedVersion eng
tib.accessRights frei zug�nglich eng


Die Publikation erscheint in Sammlung(en):

Zur Kurzanzeige

 

Suche im Repositorium


Durchblättern

Mein Nutzer/innenkonto

Nutzungsstatistiken