Generative models and Deep Reinforcement Learning for Geospatial Computer Vision

Consortium

Presagis Inc

Presagis Inc

Presagis is a Montreal-based software company that supplies the top 100 defense and aeronautic companies in the world with simulation and graphics software. Over the last decade, Presagis has built a strong reputation in helping create the complexity of the real world in a virtual one. Their deep understanding of the defense and aeronautic industries combined with expertise in synthetic environments, simulation & visualization, human-machine interfaces, and sensors positions them to meet today’s goals and prepare for tomorrow’s challenges. Today, Presagis is heavily investing into the research and innovation of virtual reality, artificial intelligence, and big data analysis. By leveraging their experience and recognizing emerging trends, their pioneering team of experts, former military personnel, and programmers are challenging the status quo and building tomorrow’s technology — today.

Concordia University, Montreal, Quebec

Immersive & Creative Technologies Lab

The Immersive and Creative Technologies lab (ICT lab) was established in late 2011 as a premier research lab, committed to fostering academic excellence, groundbreaking research, and innovative solutions within the field of Computer Science. Our talented team of researchers concentrate on specialized areas such as computer vision, computer graphics, virtual/augmented reality, and creative technologies, while exploring their applications across a diverse array of disciplines. At the ICT Lab, we strive to achieve ambitious long-term objectives that are centered around the development of highly realistic virtual environments. Our primary objectives include (a) creating virtual worlds that are virtually indistinguishable from the real-world locations they represent, and (b) employing these sophisticated digital twins to produce a wide range of impactful visualizations for various applications. Through our dedication to academic rigor, inventive research, and creative problem-solving, we aim to propel the boundaries of technological innovation and contribute to the advancement of human knowledge.

Researchers

People who have worked on the project; sorted according to graduation date where applicable:

Harshitha Voleti - MSc

Saikumar Iyer - MSc

Damian Bowness - MSc

Jatin Katyal - MSc - [graduated]

Shubham Rajeev Punekar - PhD

Ahmad Shabani - PhD

Amin Karimi - PhD

Naghmeh Shafiee Roudbari - PhD

Bodhiswatta Chatterjee - PhD

Sacha Lepretre - Presagis Inc (CTO)

Charalambos Poullis - Concordia (PI)

Research Objectives

Generative Modeling

Deep Reinforcement Learning

PHOENIX research programme

Publications

SR2024

Transductive meta‑learning with enhanced feature ensemble for few‑shot semantic segmentation

Amin Karimi, Charalambos Poullis
Scientific Reports, 2024
This paper addresses few-shot semantic segmentation and proposes a novel transductive end-to-end method that overcomes three key problems affecting performance.
First, we present a novel ensemble of visual features learned from pretrained classification and semantic segmentation networks with the same architecture. Our approach leverages the varying discriminative power of these networks, resulting in rich and diverse visual features that are more informative than a pretrained classification backbone that is not optimized for dense pixel-wise classification tasks used in most state-of-the-art methods. Secondly, the pretrained semantic segmentation network serves as a base class extractor,which effectively mitigates false positives that occur during inference time and are caused by base objects other than the object of interest. Thirdly, a two-step segmentation approach using transductive meta-learning is presented to address the episodes with poor similarity between the support and query images. The proposed transductive meta-learning method addresses the prediction by first learning the relationship between labeled and unlabeled data points with matching support foreground to query features (intra-class similarity) and then applying this knowledge to predict on the unlabeled query image (intra-object similarity), which simultaneously learns propagation and false positive suppression. To evaluate our method, we performed experiments on benchmark datasets, and the results demonstrate significant improvement with minimal trainable parameters of 2.98M. Specifically, using Resnet-101, we achieve state-of-the-art performance for both 1-shot and 5-shot Pascal-5i, as well as for 1-shot and 5-shot COCO-20i.
ISVC2023

Strategic Incorporation of Synthetic Data for Performance Enhancement in Deep Learning: A Case Study on Object Tracking Tasks

Jatin Katyal, Charalambos Poullis
18th International Symposium on Visual Computing (ISVC), 2023
Obtaining training data for machine learning models can be challenging.
Capturing or gathering the data, followed by its manual labelling, is an expensive and time-consuming process. In cases where there are no publicly accessible datasets, this can significantly hinder progress. In this paper, we analyze the similarity between synthetic and real data. While focusing on an object tracking task, we investigate the quantitative improvement influenced by the concentration of the synthetic data and the variation in the distribution of training samples induced by it. Through examination of three well-known benchmarks, we reveal guidelines that lead to performance gain. We quantify the minimum variation required and demonstrate its efficacy on prominent object-tracking neural network architecture.
ISVC2022

Tractable large-scale deep reinforcement learning

Nima Sarang, Charalambos Poullis
CVIU, 2023
Reinforcement learning (RL) has emerged as one of the most promising and powerful techniques in deep learning. The training of intelligent agents requires a myriad of training examples which imposes a substantial computational cost.
Consequently, RL is seldom applied to real-world problems and historically has been limited to computer vision tasks, similar to supervised learning. This work proposes an RL framework for complex, partially observable, large-scale environments. We introduce novel techniques for tractable training on commodity GPUs, and significantly reduce computational costs. Furthermore, we present a self-supervised loss that improves the learning stability in applications with a long-time horizon, shortening the training time. We demonstrate the effectiveness of the proposed solution on the application of road extraction from high-resolution satellite images. We present experiments on satellite images of fifteen cities that demonstrate comparable performance to state-of-the-art methods. To the best of our knowledge, this is the first time RL has been applied for extracting road networks. The code is publicly available at https://github.com/nsarang/road-extraction-rl.
ISVC2022

Unsupervised Structure-Consistent Image-to-Image Translation

Shima Shahfar, Charalambos Poullis
ISVC, 2022
The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation. We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers.
The auxiliary module’s loss forces the generator to learn to reconstruct an image with an all-zero texture code, encouraging better disentanglement between the structure and texture information. The proposed attribute-based transfer method enables refined control in style transfer while preserving structural information without using a semantic mask. To manipulate an image, we encode both the geometry of the objects and the general style of the input images into two latent codes with an additional constraint that enforces structure consistency. Moreover, due to the auxiliary loss, training time is significantly reduced. The superiority of the proposed model is demonstrated in complex domains such as satellite images where state-of-the-art are known to fail. Lastly, we show that our model improves the quality metrics for a wide range of datasets while achieving comparable results with multi-modal image generation techniques.

Contact

Charalambos Poullis
Immersive and Creative Technologies Lab
Department of Computer Science and Software Engineering
Concordia University
1455 de Maisonneuve Blvd. West, ER 925,
Montréal, Québec,
Canada, H3G 1M8