Characterization and classification of semantic image-text relations

Otto, Christian; Springstein, Matthias; Anand, Avishek; Ewerth, Ralph

Startseite
→
Forschungseinrichtungen
→
Forschungszentren
→
Dokumentanzeige

dc.identifier.uri	http://dx.doi.org/10.15488/10705
dc.identifier.uri	https://www.repo.uni-hannover.de/handle/123456789/10783
dc.contributor.author	Otto, Christian
dc.contributor.author	Springstein, Matthias
dc.contributor.author	Anand, Avishek
dc.contributor.author	Ewerth, Ralph
dc.date.accessioned	2021-03-30T11:22:30Z
dc.date.available	2021-03-30T11:22:30Z
dc.date.issued	2020
dc.identifier.citation	Otto, C.; Springstein, M.; Anand, A.; Ewerth, R.: Characterization and classification of semantic image-text relations. In: International Journal of Multimedia Information Retrieval 9 (2020), S. 31-45. DOI: https://doi.org/10.1007/s13735-019-00187-6
dc.description.abstract	The beneficial, complementary nature of visual and textual information to convey information is widely known, for example, in entertainment, news, advertisements, science, or education. While the complex interplay of image and text to form semantic meaning has been thoroughly studied in linguistics and communication sciences for several decades, computer vision and multimedia research remained on the surface of the problem more or less. An exception is previous work that introduced the two metrics Cross-Modal Mutual Information and Semantic Correlation in order to model complex image-text relations. In this paper, we motivate the necessity of an additional metric called Status in order to cover complex image-text relations more completely. This set of metrics enables us to derive a novel categorization of eight semantic image-text classes based on three dimensions. In addition, we demonstrate how to automatically gather and augment a dataset for these classes from the Web. Further, we present a deep learning system to automatically predict either of the three metrics, as well as a system to directly predict the eight image-text classes. Experimental results show the feasibility of the approach, whereby the predict-all approach outperforms the cascaded approach of the metric classifiers. © 2020, The Author(s).	eng
dc.language.iso	eng
dc.publisher	London : Springer
dc.relation.ispartofseries	International Journal of Multimedia Information Retrieval 9 (2020)
dc.rights	CC BY 4.0 Unported
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	data augmentation	eng
dc.subject	image-text class	eng
dc.subject	multimodality	eng
dc.subject	Ssemantic gap	eng
dc.subject.ddc	004 \| Informatik	ger
dc.subject.ddc	020 \| Bibliotheks- und Informationswissenschaft	ger
dc.title	Characterization and classification of semantic image-text relations
dc.type	Article
dc.type	Text
dc.relation.essn	2192-662X
dc.relation.issn	2192-6611
dc.relation.doi	https://doi.org/10.1007/s13735-019-00187-6
dc.bibliographicCitation.volume	9
dc.bibliographicCitation.firstPage	31
dc.bibliographicCitation.lastPage	45
dc.description.version	publishedVersion
tib.accessRights	frei zug�nglich