2.4. Number of Floors Detector

On a randomly selected set of in-the-wild building images from New Jersey’s Bergen, Middlesex, and Moris Counties, the model attains an F1-score of 86%. Here, in-the-wild building images are defined as street-level photos that may contain multiple buildings and are captured with random camera properties. confusion_nFloorWildv2 is the confusion matrix of the model inferences on the aforementioned in-the-wild test set.

Confusion matrix (in-the-wild dataset)

Fig. 2.4.1 Confusion matrix of the pretrained model on the in-the-wild test set

If the test images are constrained such that a single building exists in each image, the building is viewed with minimal obstructions, and the images are captured such that the image plane is nearly parallel to the frontal plane of the building facade, the F1-score of the model is determined as 94.7%. confusion_nFloorClean shows the confusion matrix for the pretrained model on a test set generated according to these constraints.

Confusion matrix (clean dataset)

Fig. 2.4.2 Confusion matrix of the pretrained model on the dataset containing lightly distorted/obstructed images of individual buildings

Table 2.4.1 shows a sample of images removed from the in-the-wild test set that were found to display weak resemblance of the visual cues necessary for a valid number of floor predictions.

Table 2.4.1 In-the-wild street level imagery removed as a part of dataset cleaning

Fig. 2.4.3 Heavily occluded building facade


Fig. 2.4.4 Closely spaced buildings: obscure prediction target


Fig. 2.4.5 Significant perspective distortions


Fig. 2.4.6 Heavily occluded building facade