DAEDALUS: Massive-scale Urban Reconstruction, Classification, and Rendering from Remote Sensor Imagery

Consortium

Canada DRDC

Defence Research and Development Canada (DRDC) is the national leader in defence and security science and technology. As an agency of Canada’s Department of National Defence (DND), DRDC provides DND, the Canadian Armed Forces and other government departments as well as the public safety and national security communities, the knowledge and technological advantage needed to defend and protect Canada’s interests at home and abroad.

Presagis Inc

Presagis is a Montreal-based software company that supplies the top 100 defense and aeronautic companies in the world with simulation and graphics software. Over the last decade, Presagis has built a strong reputation in helping create the complexity of the real world in a virtual one. Their deep understanding of the defense and aeronautic industries combined with expertise in synthetic environments, simulation & visualization, human-machine interfaces, and sensors positions them to meet today’s goals and prepare for tomorrow’s challenges. Today, Presagis is heavily investing into the research and innovation of virtual reality, artificial intelligence, and big data analysis. By leveraging their experience and recognizing emerging trends, their pioneering team of experts, former military personnel, and programmers are challenging the status quo and building tomorrow’s technology — today.

Immersive & Creative Technologies Lab

The Immersive and Creative Technologies lab (ICT lab) was established in late 2011 as a premier research lab, committed to fostering academic excellence, groundbreaking research, and innovative solutions within the field of Computer Science. Our talented team of researchers concentrate on specialized areas such as computer vision, computer graphics, virtual/augmented reality, and creative technologies, while exploring their applications across a diverse array of disciplines. At the ICT Lab, we strive to achieve ambitious long-term objectives that are centered around the development of highly realistic virtual environments. Our primary objectives include (a) creating virtual worlds that are virtually indistinguishable from the real-world locations they represent, and (b) employing these sophisticated digital twins to produce a wide range of impactful visualizations for various applications. Through our dedication to academic rigor, inventive research, and creative problem-solving, we aim to propel the boundaries of technological innovation and contribute to the advancement of human knowledge.

Publications

Single-shot Dense Reconstruction with Epic-flow

Chen Qiao, Charalambos Poullis

IEEE 3DTV-CON, 2018

In this paper we present a novel method for generating dense reconstructions by applying only structure-from-motion(SfM) on large-scale datasets without the need for multi-view stereo as a post-processing step.

A state-of-the-art optical flow technique is used to generate dense matches. The matches are encoded such that verification for correctness becomes possible, and are stored in a database on-disk. The use of this out-of-core approach transfers the requirement for large memory space to disk, therefore allowing for the processing of even larger-scale datasets than before. We compare our approach with the state-of-the-art and present the results which verify our claims.

Deep Autoencoders with Aggregated Residual Transformations for Urban Reconstruction from Remote Sensing Data

Timothy Forbes, Charalambos Poullis

15th Conference on Computer and Robot Vision, 2018

In this work we investigate urban reconstruction and propose a complete and automatic framework for reconstructing urban areas from remote sensing data.

Firstly, we address the complex problem of semantic labeling and propose a novel network architecture named SegNeXT which combines the strengths of deep-autoencoders with feed-forward links in generating smooth predictions and reducing the number of learning parameters, with the effectiveness which cardinality-enabled residual-based building blocks have shown in improving prediction accuracy and outperforming deeper/wider network architectures with a smaller number of learning parameters. The network is trained with benchmark datasets and the reported results show that it can provide at least similar and in some cases better classification than state-of-the-art. Secondly, we address the problem of urban reconstruction and propose a complete pipeline for automatically converting semantic labels into virtual representations of the urban areas. An agglomerative clustering is performed on the points according to their classification and results in a set of contiguous and disjoint clusters. Finally, each cluster is processed according to the class it belongs: tree clusters are substituted with procedural models, cars are replaced with simplified CAD models, buildings' boundaries are extruded to form 3D models, and road, low vegetation, and clutter clusters are triangulated and simplified. The result is a complete virtual representation of the urban area. The proposed framework has been extensively tested on large-scale benchmark datasets and the semantic labeling and reconstruction results are reported.

Large-scale Urban Reconstruction with Tensor Clustering and Global Boundary Refinement

Charalambos Poullis

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

Accurate and efficient methods for large-scale urban reconstruction are of significant importance to the computer vision and computer graphics communities.

Although rapid acquisition techniques such as airborne LiDAR have been around for many years, creating a useful and functional virtual environment from such data remains difficult and labor intensive. This is due largely to the necessity in present solutions for data dependent user defined parameters. In this paper we present a new solution for automatically converting large LiDAR data pointcloud into simplified polygonal 3D models. The data is first divided into smaller components which are processed independently and concurrently to extract various metrics about the points. Next, the extracted information is converted into tensors. A robust agglomerate clustering algorithm is proposed to segment the tensors into clusters representing geospatial objects e.g. roads, buildings, etc. Unlike previous methods, the proposed tensor clustering process has no data dependencies and does not require any user-defined parameter. The required parameters are adaptively computed assuming a Weibull distribution for similarity distances. Lastly, to extract boundaries from the clusters a new multi-stage boundary refinement process is developed by reformulating this extraction as a global optimization problem. We have extensively tested our methods on several pointcloud datasets of different resolutions which exhibit significant variability in geospatial characteristics e.g. ground surface inclination, building density, etc and the results are reported. The source code for both tensor clustering and global boundary refinement will be made publicly available with the publication on the author’s website.

On Building Classification from Remote Sensor Imagery Using Deep Neural Networks and the Relation Between Classification and Reconstruction Accuracy Using Border Localization as Proxy

Bodhiswatta Chatterjee, Charalambos Poullis

16th Conference on Computer and Robot Vision, 2019

Convolutional neural networks have been shown to have a very high accuracy when applied to certain visual tasks and in particular semantic segmentation.

Convolutional neural networks have been shown to have a very high accuracy when applied to certain visual tasks and in particular semantic segmentation. In this paper we address the problem of semantic segmentation of buildings from remote sensor imagery. We present ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on INRIA's benchmark dataset and is shown to outperform all other state-of-the-art by more than 1.5% on the Jaccard index. Furthermore, as the building classification is typically the first step of the reconstruction process, in the latter part of the paper we investigate the relationship of the classification accuracy to the reconstruction accuracy. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. The source code and supplemental material is publicly available at http://www.theICTlab.org/lp/2019ICTNet/

Delineation of Road Networks Using Deep Residual Neural Networks and Iterative Hough Transform

Pinjing Xu, Charalambos Poullis

International Symposium on Visual Computing, 2019

In this paper we present a complete pipeline for extracting road network vector data from satellite RGB orthophotos of urban areas.

Firstly, a network based on the SegNeXt architecture with a novel loss function is employed for the semantic segmentation of the roads. Results show that the proposed network produces on average better results than other state-of-the-art semantic segmentation techniques. Secondly, we propose a fast post-processing technique for vectorizing the rasterized segmentation result, removing erroneous lines, and refining the road network. The result is a set of vectors representing the road network. We have extensively tested the proposed pipeline and provide quantitative and qualitative comparisons with other state-of-the-art based on a number of known metrics.

Semantic Segmentation from Remote Sensor Data and the Exploitation of Latent Learning for Classification of Auxiliary Tasks

Bodhiswatta Chatterjee, Charalambos Poullis

Computer Vision and Image Understanding, 2021

In this paper we address three different aspects of semantic segmentation from remote sensor data using deep neural networks.

Firstly, we focus on the semantic segmentation of buildings from remote sensor data and propose ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on the INRIA and AIRS benchmark datasets and is shown to outperform other state of the art. Secondly, as the building classification is typically the first step of the reconstruction process, we investigate the relationship of the classification accuracy to the reconstruction accuracy. Due to the lack of (1) scene depth information, and (2) ground-truth (blueprints) for large urban-areas, the evaluation of the 3D reconstructions is not possible. Thus, we use boundary localization as a proxy to reconstruction accuracy and perform the evaluation in 2D. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. Finally, we present the simple yet compelling concept of latent learning and the implications it carries within the context of deep learning. We posit that a network trained on a primary task (i.e. building classification) is unintentionally learning about auxiliary tasks (e.g. the classification of road, tree, etc) which are complementary to the primary task. Although embedded in a trained network, this latent knowledge relating to the auxiliary tasks is never externalized or immediately expressed but instead only knowledge relating to the primary task is ever output by the network. We experimentally prove this occurrence of incidental learning on the pre-trained ICT-Net and show how sub-classification of the negative label is possible without further training/fine-tuning. We present the results of our experiments and explain how knowledge about auxiliary and complementary tasks - for which the network was never trained - can be retrieved and utilized for further classification. We extensively tested the proposed technique on the ISPRS benchmark dataset which contains multi-label ground truth, and report an average classification accuracy (F1 score) of 54.29% (SD=17.03) for roads, 10.15% (SD=2.54) for cars, 24.11% (SD=5.25) for trees, 42.74% (SD=6.62) for low vegetation, and 18.30% (SD=16.08) for clutter.

Predicting Surface Reflectance Properties of Outdoor Scenes Under Unknown Natural Illumination

Farhan Rahman Wasee, Alen Joy, Charalambos Poullis

IEEE Computer Graphics & Applications, 2022

This paper proposes a complete framework to predict surface reflectance properties of outdoor scenes under unknown natural illumination.

Uniquely, we recast the problem into its two constituent components involving the BRDF incoming light and outgoing view directions: (i) surface points' radiance captured in the images, and outgoing view directions are aggregated and encoded into reflectance maps, and (ii) a neural network trained on reflectance maps of renders of a unit sphere under arbitrary light directions infers a low-parameter reflection model representing the reflectance properties at each surface in the scene. Our model is based on a combination of phenomenological and physics-based scattering models and can relight the scenes from novel viewpoints. We present experiments that show that rendering with the predicted reflectance properties results in a visually similar appearance to using textures that cannot otherwise be disentangled from the reflectance properties.

Adaptive Memory Management for Video Object Segmentation

Ali Pourganjalikhan, Charalambos Poullis

19th Conference on Robots and Vision (CRV), 2022

Matching-based networks have achieved state-of-the-art performance for video object segmentation (VOS) tasks by storing every-k frames in an external memory bank for future inference.

Storing the intermediate frames' predictions provides the network with richer cues for segmenting an object in the current frame. However, the size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos. This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features. Features are indexed based on their importance in the segmentation of the objects in previous frames. Based on the index, we discard unimportant features to accommodate new features. We present our experiments on DAVIS 2016, DAVIS 2017, and Youtube-VOS that demonstrate that our method outperforms state-of-the-art that employ first-and-latest strategy with fixed-sized memory banks and achieves comparable performance to the every-k strategy with increasing-sized memory banks. Furthermore, experiments show that our method increases inference speed by up to 80% over the every-k and 35% over first-and-latest strategies.

Unsupervised Structure-Consistent Image-to-Image Translation

Shima Shahfar, Charalambos Poullis

17th International Symposium on Visual Computing (ISVC), 2022

The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation.

We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers. The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code, encouraging better disentanglement between the structure and texture information. The proposed attribute-based transfer method enables refined control in style transfer while preserving structural information without using a semantic mask. To manipulate an image, we encode both the geometry of the objects and the general style of the input images into two latent codes with an additional constraint that enforces structure consistency. Moreover, due to the auxiliary loss, training time is significantly reduced. The superiority of the proposed model is demonstrated in complex domains such as satellite images where state-of-the-art are known to fail. Lastly, we show that our model improves the quality metrics for a wide range of datasets while achieving comparable results with multi-modal image generation techniques.

Motion Estimation for Large Displacements and Deformations

Qiao Chen, Charalambos Poullis

Scientific Reports, 2022

Large displacement optical flow is an integral part of many computer vision tasks. Variational optical flow techniques based on a coarse-to-fine scheme interpolate sparse matches and locally optimize an energy model conditioned on colour, gradient and smoothness,

making them sensitive to noise in the sparse matches, deformations, and arbitrarily large displacements. This paper addresses this problem and presents HybridFlow, a variational motion estimation framework for large displacements and deformations. A multi-scale hybrid matching approach is performed on the image pairs. Coarse-scale clusters formed by classifying pixels according to their feature descriptors are matched using the clusters' context descriptors. We apply a multi-scale graph matching on the finer-scale superpixels contained within each matched pair of coarse-scale clusters. Small clusters that cannot be further subdivided are matched using localized feature matching. Together, these initial matches form the flow, which is propagated by an edge-preserving interpolation and variational refinement. Our approach does not require training and is robust to substantial displacements and rigid and non-rigid transformations due to motion in the scene, making it ideal for large-scale imagery such as Wide-Area Motion Imagery (WAMI). More notably, HybridFlow works on directed graphs of arbitrary topology representing perceptual groups, which improves motion estimation in the presence of significant deformations. We demonstrate HybridFlow's superior performance to state-of-the-art variational techniques on two benchmark datasets and report comparable results with state-of-the-art deep-learning-based techniques.

Contact

Charalambos Poullis
Immersive and Creative Technologies Lab
Department of Computer Science and Software Engineering
Concordia University
1455 de Maisonneuve Blvd. West, ER 925,
Montréal, Québec,
Canada, H3G 1M8

charalambos [at] poullis [dot] org

research [at] theICTlab [dot] org

+1 514 848 2424 x3019

Consortium

Researchers

People who have worked or are working on the project; sorted according to graduation date where applicable:

Research

Research objectives

Publications

Chen Qiao, Charalambos Poullis

IEEE 3DTV-CON, 2018

Timothy Forbes, Charalambos Poullis

15th Conference on Computer and Robot Vision, 2018

Charalambos Poullis

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

Bodhiswatta Chatterjee, Charalambos Poullis

16th Conference on Computer and Robot Vision, 2019

Pinjing Xu, Charalambos Poullis

International Symposium on Visual Computing, 2019

Bodhiswatta Chatterjee, Charalambos Poullis

Computer Vision and Image Understanding, 2021

Farhan Rahman Wasee, Alen Joy, Charalambos Poullis

IEEE Computer Graphics & Applications, 2022

Ali Pourganjalikhan, Charalambos Poullis

19th Conference on Robots and Vision (CRV), 2022

Shima Shahfar, Charalambos Poullis

17th International Symposium on Visual Computing (ISVC), 2022

Qiao Chen, Charalambos Poullis

Scientific Reports, 2022

Contact