MASSIVE-SCALE URBAN RECONSTRUCTION, CLASSIFICATION, AND RENDERING FROM REMOTE SENSOR IMAGERY

Consortium

Canada DRDC

Canada DRDC

Defence Research and Development Canada (DRDC) is the national leader in defence and security science and technology. As an agency of Canada’s Department of National Defence (DND), DRDC provides DND, the Canadian Armed Forces and other government departments as well as the public safety and national security communities, the knowledge and technological advantage needed to defend and protect Canada’s interests at home and abroad.

Presagis Inc

Presagis Inc

Presagis is a Montreal-based software company that supplies the top 100 defense and aeronautic companies in the world with simulation and graphics software. Over the last decade, Presagis has built a strong reputation in helping create the complexity of the real world in a virtual one. Their deep understanding of the defense and aeronautic industries combined with expertise in synthetic environments, simulation & visualization, human-machine interfaces, and sensors positions them to meet today’s goals and prepare for tomorrow’s challenges. Today, Presagis is heavily investing into the research and innovation of virtual reality, artificial intelligence, and big data analysis. By leveraging their experience and recognizing emerging trends, their pioneering team of experts, former military personnel, and programmers are challenging the status quo and building tomorrow’s technology — today.

Concordia University, Montreal, Quebec

Immersive and Creative Technologies Lab

The Immersive and Creative Technologies lab was founded in late 2011 and since its establishment it has been focusing on fundamental and applied research in the areas of computer vision, computer graphics, virtual/augmented reality and creative technologies, and their application in a wide range of fields. More specifically, the long term objectives of the research at the ICT Lab are to create (a) virtual worlds which are indistinguishable [in all aspects] from the real-world areas they represent and, (b) visualizations employing these realistic virtual worlds for a wide range of applications.
The ICT lab is part of the Department of Computer Science and Software Engineering at the Faculty of Engineering and Computer Science at Concordia University.

DAEDALUS researchers
The orthophoto RGB image used to generate this image is courtesy of Defence Research and Development Canada and Thales Canada.

Researchers

People who have worked or are working on the project; sorted according to graduation date where applicable:

Jatin Katyal - MSc

Bodhiswatta Chatterjee - PhD

Amin Karimi - PhD

Chen Qiao - PhD

Nima Sarang - MSc - [graduated]

Shima Shahfar - MSc - [graduated]

Alen Joy - MSc - [graduated]

Ali Pourganjalikhan - MSc - [graduated]

Farhan Rahman Wasee - MSc - [graduated]

Bodhiswatta Chatterjee - MSc - [graduated]

Pinjing Xu - MSc - [graduated]

Adnan Utayim - NSERC USRA - [graduated]

Timothy Forbes - MSc - [graduated]

Sacha Leprêtre - Presagis (CTO)

Charalambos Poullis - Concordia (PI)

Research

Research objectives

Image-based Modeling

Classification of geospatial features and road extraction

Photorealistic rendering

DAEDALUS research programme

Publications

IEEE_3DTV_2018

Single-shot Dense Reconstruction with Epic-flow

Chen Qiao, Charalambos Poullis
IEEE 3DTV-CON, 2018
In this paper we present a novel method for generating dense reconstructions by applying only structure-from-motion(SfM) on large-scale datasets without the need for multi-view stereo as a post-processing step.
A state-of-the-art optical flow technique is used to generate dense matches. The matches are encoded such that verification for correctness becomes possible, and are stored in a database on-disk. The use of this out-of-core approach transfers the requirement for large memory space to disk, therefore allowing for the processing of even larger-scale datasets than before. We compare our approach with the state-of-the-art and present the results which verify our claims.
CRV 2018

Deep Autoencoders with Aggregated Residual Transformations for Urban Reconstruction from Remote Sensing Data

Timothy Forbes, Charalambos Poullis
15th Conference on Computer and Robot Vision, 2018
In this work we investigate urban reconstruction and propose a complete and automatic framework for reconstructing urban areas from remote sensing data.
Firstly, we address the complex problem of semantic labeling and propose a novel network architecture named SegNeXT which combines the strengths of deep-autoencoders with feed-forward links in generating smooth predictions and reducing the number of learning parameters, with the effectiveness which cardinality-enabled residual-based building blocks have shown in improving prediction accuracy and outperforming deeper/wider network architectures with a smaller number of learning parameters. The network is trained with benchmark datasets and the reported results show that it can provide at least similar and in some cases better classification than state-of-the-art. Secondly, we address the problem of urban reconstruction and propose a complete pipeline for automatically converting semantic labels into virtual representations of the urban areas. An agglomerative clustering is performed on the points according to their classification and results in a set of contiguous and disjoint clusters. Finally, each cluster is processed according to the class it belongs: tree clusters are substituted with procedural models, cars are replaced with simplified CAD models, buildings' boundaries are extruded to form 3D models, and road, low vegetation, and clutter clusters are triangulated and simplified. The result is a complete virtual representation of the urban area. The proposed framework has been extensively tested on large-scale benchmark datasets and the semantic labeling and reconstruction results are reported.
TPAMI 2019

Large-scale Urban Reconstruction with Tensor Clustering and Global Boundary Refinement

Charalambos Poullis
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
Accurate and efficient methods for large-scale urban reconstruction are of significant importance to the computer vision and computer graphics communities.
Although rapid acquisition techniques such as airborne LiDAR have been around for many years, creating a useful and functional virtual environment from such data remains difficult and labor intensive. This is due largely to the necessity in present solutions for data dependent user defined parameters. In this paper we present a new solution for automatically converting large LiDAR data pointcloud into simplified polygonal 3D models. The data is first divided into smaller components which are processed independently and concurrently to extract various metrics about the points. Next, the extracted information is converted into tensors. A robust agglomerate clustering algorithm is proposed to segment the tensors into clusters representing geospatial objects e.g. roads, buildings, etc. Unlike previous methods, the proposed tensor clustering process has no data dependencies and does not require any user-defined parameter. The required parameters are adaptively computed assuming a Weibull distribution for similarity distances. Lastly, to extract boundaries from the clusters a new multi-stage boundary refinement process is developed by reformulating this extraction as a global optimization problem. We have extensively tested our methods on several pointcloud datasets of different resolutions which exhibit significant variability in geospatial characteristics e.g. ground surface inclination, building density, etc and the results are reported. The source code for both tensor clustering and global boundary refinement will be made publicly available with the publication on the author’s website.
CRV 2019

On Building Classification from Remote Sensor Imagery Using Deep Neural Networks and the Relation Between Classification and Reconstruction Accuracy Using Border Localization as Proxy

Bodhiswatta Chatterjee, Charalambos Poullis
16th Conference on Computer and Robot Vision, 2019
Convolutional neural networks have been shown to have a very high accuracy when applied to certain visual tasks and in particular semantic segmentation.
Convolutional neural networks have been shown to have a very high accuracy when applied to certain visual tasks and in particular semantic segmentation. In this paper we address the problem of semantic segmentation of buildings from remote sensor imagery. We present ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on INRIA's benchmark dataset and is shown to outperform all other state-of-the-art by more than 1.5% on the Jaccard index. Furthermore, as the building classification is typically the first step of the reconstruction process, in the latter part of the paper we investigate the relationship of the classification accuracy to the reconstruction accuracy. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. The source code and supplemental material is publicly available at http://www.theICTlab.org/lp/2019ICTNet/
ISVC 2019

Delineation of Road Networks Using Deep Residual Neural Networks and Iterative Hough Transform

Pinjing Xu, Charalambos Poullis
International Symposium on Visual Computing, 2019
In this paper we present a complete pipeline for extracting road network vector data from satellite RGB orthophotos of urban areas.
Firstly, a network based on the SegNeXt architecture with a novel loss function is employed for the semantic segmentation of the roads. Results show that the proposed network produces on average better results than other state-of-the-art semantic segmentation techniques. Secondly, we propose a fast post-processing technique for vectorizing the rasterized segmentation result, removing erroneous lines, and refining the road network. The result is a set of vectors representing the road network. We have extensively tested the proposed pipeline and provide quantitative and qualitative comparisons with other state-of-the-art based on a number of known metrics.
CVIU 2021

Semantic Segmentation from Remote Sensor Data and the Exploitation of Latent Learning for Classification of Auxiliary Tasks

Bodhiswatta Chatterjee, Charalambos Poullis
Computer Vision and Image Understanding, 2021
In this paper we address three different aspects of semantic segmentation from remote sensor data using deep neural networks.
Firstly, we focus on the semantic segmentation of buildings from remote sensor data and propose ICT-Net: a novel network with the underlying architecture of a fully convolutional network, infused with feature re-calibrated Dense blocks at each layer. Uniquely, the proposed network combines the localization accuracy and use of context of the U-Net network architecture, the compact internal representations and reduced feature redundancy of the Dense blocks, and the dynamic channel-wise feature re-weighting of the Squeeze-and-Excitation(SE) blocks. The proposed network has been tested on the INRIA and AIRS benchmark datasets and is shown to outperform other state of the art. Secondly, as the building classification is typically the first step of the reconstruction process, we investigate the relationship of the classification accuracy to the reconstruction accuracy. Due to the lack of (1) scene depth information, and (2) ground-truth (blueprints) for large urban-areas, the evaluation of the 3D reconstructions is not possible. Thus, we use boundary localization as a proxy to reconstruction accuracy and perform the evaluation in 2D. A comparative quantitative analysis of reconstruction accuracies corresponding to different classification accuracies confirms the strong correlation between the two. We present the results which show a consistent and considerable reduction in the reconstruction accuracy. Finally, we present the simple yet compelling concept of latent learning and the implications it carries within the context of deep learning. We posit that a network trained on a primary task (i.e. building classification) is unintentionally learning about auxiliary tasks (e.g. the classification of road, tree, etc) which are complementary to the primary task. Although embedded in a trained network, this latent knowledge relating to the auxiliary tasks is never externalized or immediately expressed but instead only knowledge relating to the primary task is ever output by the network. We experimentally prove this occurrence of incidental learning on the pre-trained ICT-Net and show how sub-classification of the negative label is possible without further training/fine-tuning. We present the results of our experiments and explain how knowledge about auxiliary and complementary tasks - for which the network was never trained - can be retrieved and utilized for further classification. We extensively tested the proposed technique on the ISPRS benchmark dataset which contains multi-label ground truth, and report an average classification accuracy (F1 score) of 54.29% (SD=17.03) for roads, 10.15% (SD=2.54) for cars, 24.11% (SD=5.25) for trees, 42.74% (SD=6.62) for low vegetation, and 18.30% (SD=16.08) for clutter.
IEEE CGA 2022

Predicting Surface Reflectance Properties of Outdoor Scenes Under Unknown Natural Illumination

Farhan Rahman Wasee, Alen Joy, Charalambos Poullis
IEEE Computer Graphics & Applications, 2022
This paper proposes a complete framework to predict surface reflectance properties of outdoor scenes under unknown natural illumination.
Uniquely, we recast the problem into its two constituent components involving the BRDF incoming light and outgoing view directions: (i) surface points' radiance captured in the images, and outgoing view directions are aggregated and encoded into reflectance maps, and (ii) a neural network trained on reflectance maps of renders of a unit sphere under arbitrary light directions infers a low-parameter reflection model representing the reflectance properties at each surface in the scene. Our model is based on a combination of phenomenological and physics-based scattering models and can relight the scenes from novel viewpoints. We present experiments that show that rendering with the predicted reflectance properties results in a visually similar appearance to using textures that cannot otherwise be disentangled from the reflectance properties.
CRV 2022

Adaptive Memory Management for Video Object Segmentation

Ali Pourganjalikhan, Charalambos Poullis
19th Conference on Robots and Vision (CRV), 2022
Matching-based networks have achieved state-of-the-art performance for video object segmentation (VOS) tasks by storing every-k frames in an external memory bank for future inference.
Storing the intermediate frames' predictions provides the network with richer cues for segmenting an object in the current frame. However, the size of the memory bank gradually increases with the length of the video, which slows down inference speed and makes it impractical to handle arbitrary length videos. This paper proposes an adaptive memory bank strategy for matching-based networks for semi-supervised video object segmentation (VOS) that can handle videos of arbitrary length by discarding obsolete features. Features are indexed based on their importance in the segmentation of the objects in previous frames. Based on the index, we discard unimportant features to accommodate new features. We present our experiments on DAVIS 2016, DAVIS 2017, and Youtube-VOS that demonstrate that our method outperforms state-of-the-art that employ first-and-latest strategy with fixed-sized memory banks and achieves comparable performance to the every-k strategy with increasing-sized memory banks. Furthermore, experiments show that our method increases inference speed by up to 80% over the every-k and 35% over first-and-latest strategies.

Contact

Charalambos Poullis
Immersive and Creative Technologies Lab
Department of Computer Science and Software Engineering
Concordia University
1455 de Maisonneuve Blvd. West, ER 925,
Montréal, Québec,
Canada, H3G 1M8