Paper published in a book (Scientific congresses and symposiums)
Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification
El Khoury, Karim; Zanella, Maxime; Gérin, Benoît et al.
2025In Rao, Bhaskar D (Ed.) 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
Peer reviewed
 

Files


Full Text
2409.00698v2.pdf
Author postprint (453.79 kB)
Request a copy

All documents in ORBi UMONS are protected by a user license.

Send to



Details



Keywords :
remote sensing; scene classification; transductive inference; vision-language models; zero-shot; Software; Signal Processing; Electrical and Electronic Engineering
Abstract :
[en] Vision-Language Models for remote sensing have shown promising uses thanks to their extensive pretraining. However, their conventional usage in zero-shot scene classification methods still involves dividing large images into patches and making independent predictions, i.e., inductive inference, thereby limiting their effectiveness by ignoring valuable contextual information. Our approach tackles this issue by utilizing initial predictions based on text prompting and patch affinity relationships from the image encoder to enhance zero-shot capabilities through transductive inference, all without the need for supervision and at a minor computational cost. Experiments on 10 remote sensing datasets with state-of-the-art Vision-Language Models demonstrate significant accuracy improvements over inductive zero-shot classification. Our source code is publicly available on Github: https://github.com/elkhouryk/RS-TransCLIP.
Disciplines :
Computer science
Author, co-author :
El Khoury, Karim;  UCLouvain, Belgium
Zanella, Maxime ;  Université de Mons - UMONS > Faculté Polytechnique > Service Informatique, Logiciel et Intelligence artificielle ; UCLouvain, Belgium
Gérin, Benoît;  UCLouvain, Belgium
Godelaine, Tiffanie;  UCLouvain, Belgium
Macq, Benoît;  UCLouvain, Belgium
Mahmoudi, Saïd  ;  Université de Mons - UMONS > Faculté Polytechnique > Service Informatique, Logiciel et Intelligence artificielle
De Vleeschouwer, Christophe;  UCLouvain, Belgium
Ayed, Ismail Ben;  ÉTS Montreal, Canada
Language :
English
Title :
Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification
Publication date :
01 January 2025
Event name :
ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Event place :
Hyderabad, Ind
Event date :
06-04-2025 => 11-04-2025
Audience :
International
Main work title :
2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
Editor :
Rao, Bhaskar D
Publisher :
Institute of Electrical and Electronics Engineers Inc.
ISBN/EAN :
9798350368741
Peer review/Selection committee :
Peer reviewed
Research unit :
F114 - Informatique, Logiciel et Intelligence artificielle
Research institute :
R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique
Funders :
IEEE
IEEE Signal Processing Society
Funding text :
M.Z. and B.G. are funded by the Walloon region under grant No. 2010235 (ARIAC by DIGITALWALLONIA4.AI). T.G. is funded by MedReSyst part of the Walloon Region and EU-Wallonie 2021-2027 program.
Available on ORBi UMONS :
since 15 January 2026

Statistics


Number of views
3 (0 by UMONS)
Number of downloads
0 (0 by UMONS)

Scopus citations®
 
4
Scopus citations®
without self-citations
4
OpenCitations
 
0
OpenAlex citations
 
3

Bibliography


Similar publications



Contact ORBi UMONS