PHOENIX: Generative models and Deep Reinforcement Learning for Geospatial Computer Vision

Consortium

Presagis Inc

Presagis is a Montreal-based software company that supplies the top 100 defense and aeronautic companies in the world with simulation and graphics software. Over the last decade, Presagis has built a strong reputation in helping create the complexity of the real world in a virtual one. Their deep understanding of the defense and aeronautic industries combined with expertise in synthetic environments, simulation & visualization, human-machine interfaces, and sensors positions them to meet today’s goals and prepare for tomorrow’s challenges. Today, Presagis is heavily investing into the research and innovation of virtual reality, artificial intelligence, and big data analysis. By leveraging their experience and recognizing emerging trends, their pioneering team of experts, former military personnel, and programmers are challenging the status quo and building tomorrow’s technology — today.

Immersive & Creative Technologies Lab

The Immersive and Creative Technologies lab (ICT lab) was established in late 2011 as a premier research lab, committed to fostering academic excellence, groundbreaking research, and innovative solutions within the field of Computer Science. Our talented team of researchers concentrate on specialized areas such as computer vision, computer graphics, virtual/augmented reality, and creative technologies, while exploring their applications across a diverse array of disciplines. At the ICT Lab, we strive to achieve ambitious long-term objectives that are centered around the development of highly realistic virtual environments. Our primary objectives include (a) creating virtual worlds that are virtually indistinguishable from the real-world locations they represent, and (b) employing these sophisticated digital twins to produce a wide range of impactful visualizations for various applications. Through our dedication to academic rigor, inventive research, and creative problem-solving, we aim to propel the boundaries of technological innovation and contribute to the advancement of human knowledge.

Publications

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation

Amin Karimi, Charalambos Poullis

IEEE/CVF Computer Vision and Pattern Recognition (CVPR), 2025

Few-shot semantic segmentation (FSS) aims to enable models to segment novel/unseen object classes using only a limited number of labeled examples.

However, current FSS methods frequently struggle with generalization due to incomplete and biased feature representations, especially when support images do not capture the full appearance variability of the target class. To improve the FSS pipeline, we propose a novel framework that utilizes large language models (LLMs) to adapt general class semantic information to the query image. Furthermore, the framework employs dense pixel-wise matching to identify similarities between query and support images, resulting in enhanced FSS performance. Inspired by reasoning-based segmentation frameworks, our method, named DSV-LFS, introduces an additional token into the LLM vocabulary, allowing a multimodal LLM to generate a "semantic prompt" from class descriptions. In parallel, a dense matching module identifies visual similarities between the query and support images, generating a "visual prompt". These prompts are then jointly employed to guide the prompt-based decoder for accurate segmentation of the query image. Comprehensive experiments on the benchmark datasets Pascal-5i and COCO-20i demonstrate that our framework achieves state-of-the-art performance-by a significant margin-demonstrating superior generalization to novel classes and robustness across diverse scenarios. The source code is available here

Transductive meta‑learning with enhanced feature ensemble for few‑shot semantic segmentation

Amin Karimi, Charalambos Poullis

Scientific Reports, 2024

This paper addresses few-shot semantic segmentation and proposes a novel transductive end-to-end method that overcomes three key problems affecting performance.

First, we present a novel ensemble of visual features learned from pretrained classification and semantic segmentation networks with the same architecture. Our approach leverages the varying discriminative power of these networks, resulting in rich and diverse visual features that are more informative than a pretrained classification backbone that is not optimized for dense pixel-wise classification tasks used in most state-of-the-art methods. Secondly, the pretrained semantic segmentation network serves as a base class extractor,which effectively mitigates false positives that occur during inference time and are caused by base objects other than the object of interest. Thirdly, a two-step segmentation approach using transductive meta-learning is presented to address the episodes with poor similarity between the support and query images. The proposed transductive meta-learning method addresses the prediction by first learning the relationship between labeled and unlabeled data points with matching support foreground to query features (intra-class similarity) and then applying this knowledge to predict on the unlabeled query image (intra-object similarity), which simultaneously learns propagation and false positive suppression. To evaluate our method, we performed experiments on benchmark datasets, and the results demonstrate significant improvement with minimal trainable parameters of 2.98M. Specifically, using Resnet-101, we achieve state-of-the-art performance for both 1-shot and 5-shot Pascal-5i, as well as for 1-shot and 5-shot COCO-20i.

Strategic Incorporation of Synthetic Data for Performance Enhancement in Deep Learning: A Case Study on Object Tracking Tasks

Jatin Katyal, Charalambos Poullis

18th International Symposium on Visual Computing (ISVC), 2023

Obtaining training data for machine learning models can be challenging.

Capturing or gathering the data, followed by its manual labelling, is an expensive and time-consuming process. In cases where there are no publicly accessible datasets, this can significantly hinder progress. In this paper, we analyze the similarity between synthetic and real data. While focusing on an object tracking task, we investigate the quantitative improvement influenced by the concentration of the synthetic data and the variation in the distribution of training samples induced by it. Through examination of three well-known benchmarks, we reveal guidelines that lead to performance gain. We quantify the minimum variation required and demonstrate its efficacy on prominent object-tracking neural network architecture.

Tractable large-scale deep reinforcement learning

Nima Sarang, Charalambos Poullis

CVIU, 2023

Reinforcement learning (RL) has emerged as one of the most promising and powerful techniques in deep learning. The training of intelligent agents requires a myriad of training examples which imposes a substantial computational cost.

Consequently, RL is seldom applied to real-world problems and historically has been limited to computer vision tasks, similar to supervised learning. This work proposes an RL framework for complex, partially observable, large-scale environments. We introduce novel techniques for tractable training on commodity GPUs, and significantly reduce computational costs. Furthermore, we present a self-supervised loss that improves the learning stability in applications with a long-time horizon, shortening the training time. We demonstrate the effectiveness of the proposed solution on the application of road extraction from high-resolution satellite images. We present experiments on satellite images of fifteen cities that demonstrate comparable performance to state-of-the-art methods. To the best of our knowledge, this is the first time RL has been applied for extracting road networks. The code is publicly available at https://github.com/nsarang/road-extraction-rl.

Unsupervised Structure-Consistent Image-to-Image Translation

Shima Shahfar, Charalambos Poullis

ISVC, 2022

The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation. We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers.

The auxiliary module’s loss forces the generator to learn to reconstruct an image with an all-zero texture code, encouraging better disentanglement between the structure and texture information. The proposed attribute-based transfer method enables refined control in style transfer while preserving structural information without using a semantic mask. To manipulate an image, we encode both the geometry of the objects and the general style of the input images into two latent codes with an additional constraint that enforces structure consistency. Moreover, due to the auxiliary loss, training time is significantly reduced. The superiority of the proposed model is demonstrated in complex domains such as satellite images where state-of-the-art are known to fail. Lastly, we show that our model improves the quality metrics for a wide range of datasets while achieving comparable results with multi-modal image generation techniques.

Contact

Charalambos Poullis
Immersive and Creative Technologies Lab
Department of Computer Science and Software Engineering
Concordia University
1455 de Maisonneuve Blvd. West, ER 925,
Montréal, Québec,
Canada, H3G 1M8

charalambos [at] poullis [dot] org

+1 514 848 2424 x3019

Generative models and Deep Reinforcement Learning for Geospatial Computer Vision

Consortium

Presagis Inc

Immersive & Creative Technologies Lab

Researchers

People who have worked on the project; sorted according to graduation date where applicable:

Research Objectives

Publications

DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation

Amin Karimi, Charalambos Poullis

IEEE/CVF Computer Vision and Pattern Recognition (CVPR), 2025

Transductive meta‑learning with enhanced feature ensemble for few‑shot semantic segmentation

Amin Karimi, Charalambos Poullis

Scientific Reports, 2024

Strategic Incorporation of Synthetic Data for Performance Enhancement in Deep Learning: A Case Study on Object Tracking Tasks

Jatin Katyal, Charalambos Poullis

18th International Symposium on Visual Computing (ISVC), 2023

Tractable large-scale deep reinforcement learning

Nima Sarang, Charalambos Poullis

CVIU, 2023

Unsupervised Structure-Consistent Image-to-Image Translation

Shima Shahfar, Charalambos Poullis

ISVC, 2022

Contact