Abstract
The availability of GIS (Geographical Information Sys-tem) databases for many urban areas, provides a valu-able source of information for improving the performanceof many computer vision tasks. In this paper, we proposea method which leverages information acquired from GISdatabases to perform semantic segmentation of the imagealongside with geo-referencing each semantic segment with its address and geo-location. First, the image is segmented into a set of initial super-pixels. Then, by projecting the in-formation from GIS databases, a set of priors are obtainedabout the approximate location of the semantic entities suchas buildings and streets in the image plane. However, thereare significant inaccuracies (misalignments) in the projections, mainly due to inaccurate GPS-tags and camera parameters. In order to address this misalignment issue, we perform data fusion such that it improves the segmentation and GIS projections accuracy simultaneously with an iterative approach. At each iteration, the projections are evaluated and weighted in terms of reliability, and then fused with the super-pixel segmentations. First segmentation is performed using random walks, based on the GIS projections. Then the global transformation which best aligns the projections to their corresponding semantic entities is computed and applied to the projections to further align them to the content of the image. The iterative approach continues until the projections and segments are well aligned.