A Multimodal Approach for Semantic Patent Image Retrieval

Zur Kurzanzeige

dc.identifier.uri http://dx.doi.org/10.15488/16876
dc.identifier.uri https://www.repo.uni-hannover.de/handle/123456789/17003
dc.contributor.author Pustu-Iren, Kader
dc.contributor.author Bruns, Gerrit
dc.contributor.author Ewerth, Ralph
dc.contributor.editor Krestel, Ralf
dc.contributor.editor Aras, Hidir
dc.contributor.editor Andersson, Linda
dc.contributor.editor Piroi, Florina
dc.contributor.editor Hanbury, Allan
dc.contributor.editor Alderucci, Dean
dc.date.accessioned 2024-04-04T08:54:05Z
dc.date.available 2024-04-04T08:54:05Z
dc.date.issued 2021
dc.identifier.citation Pustu-Iren, K.; Bruns, G.; Ewerth, R.: A Multimodal Approach for Semantic Patent Image Retrieval. In: Krestel, Ralf; Aras, Hidir; Andersson, Linda; Piroi, Florina; Hanbury, Allan; Alderucci, Dean (Eds.): PatentSemTech 2021: Patent Text Mining and Semantic Technologies 2021 : proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021, co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). Aachen, Germany : RWTH Aachen, 2021 (CEUR Workshop Proceedings ; 2909), S. 45-49.
dc.description.abstract Patent images such as technical drawings contain valuable information and are frequently used by experts to compare patents. However, current approaches to patent information retrieval are largely focused on textual information. Consequently, we review previous work on patent retrieval with a focus on illustrations in figures. In this paper, we report on work in progress for a novel approach for patent image retrieval that uses deep multimodal features. Scene text spotting and optical character recognition are employed to extract numerals from an image to subsequently identify references to corresponding sentences in the patent document. Furthermore, we use a neural state-of-the-art CLIP model to extract structural features from illustrations and additionally derive textual features from the related patent text using a sentence transformer model. To fuse our multimodal features for similarity search we apply re-ranking according to averaged or maximum scores. In our experiments, we compare the impact of different modalities on the task of similarity search for patent images. The experimental results suggest that patent image retrieval can be successfully performed using the proposed feature sets, while the best results are achieved when combining the features of both modalities. eng
dc.language.iso eng
dc.publisher Aachen, Germany : RWTH Aachen
dc.relation.ispartof PatentSemTech 2021: Patent Text Mining and Semantic Technologies 2021 : proceedings of the 2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech) 2021, co-located with the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021)
dc.relation.ispartofseries CEUR Workshop Proceedings ; 2909
dc.relation.uri https://ceur-ws.org/Vol-2909/paper6.pdf
dc.rights CC BY 4.0 Unported
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.subject Patent Image Similarity Search eng
dc.subject Deep Learning eng
dc.subject Mulitmodal Feature Representations eng
dc.subject Scene Text Spotting eng
dc.subject.classification Konferenzschrift ger
dc.subject.ddc 004 | Informatik
dc.subject.ddc 020 | Bibliotheks- und Informationswissenschaft
dc.title A Multimodal Approach for Semantic Patent Image Retrieval eng
dc.type BookPart
dc.type Text
dc.relation.essn 1613-0073
dc.bibliographicCitation.volume 2909
dc.bibliographicCitation.firstPage 45
dc.bibliographicCitation.lastPage 49
dc.description.version publishedVersion
tib.accessRights frei zug�nglich


Die Publikation erscheint in Sammlung(en):

Zur Kurzanzeige

 

Suche im Repositorium


Durchblättern

Mein Nutzer/innenkonto

Nutzungsstatistiken