.. _lbl-occupancyClassifier-vnv: Occupancy Classifier ======================== The Occupancy Classifier's methodology has been presented in :ref:`occupancyTheory`, and examples showing how to use it can be found in :ref:`lbl-occupancyClassifier`. This section presents its validation against two datasets. Dataset 1: Compare with OpenStreetMap Labels ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The trained classifier is tested on a ground truth dataset that can be downloaded from `here `_. We firstly obtained a set of randomly selected buildings in the United States with occupancy tags found on OpenStreetMap. We then downloaded the street view images from Google Street View for each building. We removed images in which we didn't clearly see there is a building. The dataset contains 98 single family buildings (RES1), 97 multi-family buildings (RES3) and 98 commercial buildings (COM). Examples of these street view images can be found in :ref:`lbl-occupancyClassifier`. Run the following python script to test on this dataset. .. code-block:: python # download the testing dataset import wget import zipfile wget.download('https://zenodo.org/record/4553803/files/occupancy_validation_images.zip') with zipfile.ZipFile('occupancy_validation_images.zip', 'r') as zip_ref: zip_ref.extractall('.') # prepare the image lists import shutil import os import pandas as pd from glob import glob class_names = ['RES3', 'COM' ,'RES1'] labels = [] images = [] for clas in class_names: imgs = glob(f'occupancy_validation_images/{clas}/*.jpg') for img in imgs: labels.append(clas) images.append(img) # import the module from brails.modules import OccupancyClassifier # initialize the classifier occupancyModel = OccupancyClassifier() # use the model to predict pred = occupancyModel.predict(images) predictions = pred['prediction'].tolist() # Plot results from brails.utils.plotUtils import plot_confusion_matrix from sklearn.metrics import confusion_matrix from sklearn.metrics import f1_score,accuracy_score print(' Accuracy is : {}, Random guess is 0.33'.format(accuracy_score(predictions,labels))) cnf_matrix = confusion_matrix(predictions,labels) plot_confusion_matrix(cnf_matrix, classes=class_names, title='Confusion matrix',normalize=False,xlabel='Labels',ylabel='Predictions') The confusion matrix tested on this dataset is shown in :numref:`fig_confusion_occupancyv2`. .. _fig_confusion_occupancyv2: .. figure:: ../../images/technical/confusion_occupancy_v2.png :width: 40% :alt: Confusion matrix occupancy class Confusion matrix - Occupancy Class classifier The accuracy for the two classes are: * RES3: Accuracy = 0.97, F1 = 0.97 * COM: Accuracy = 0.97, F1 = 0.98 * RES1: Accuracy = 0.99, F1 = 0.97 Dataset 2: Compare with NJDEP Dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The second validation dataset is from New Jersey Department of Environmental Protection (NJDEP). NJDEP developed a building inventory for flood hazard and risk analysis as part of its flood control and resilience mission. In this dataset, we can find building footprints with their occupancy types labelled. We randomly selected a subset of those records, for each we downloaded a street view image from Google Maps Static API. Examples of these satellite images can be found in :ref:`occupancyTheory`. The NJDEP occupancy data includes the following labels: * RES1 26574 * RES3A 1714 * COM1 1110 * RES3B 1016 * RES3C 779 * RES3D 566 * COM8 187 * AGR1 113 * RES4 111 * COM4 100 * GOV1 90 * IND2 83 * COM3 74 * REL1 67 * RES3E 52 * EDU1 48 * IND3 37 * GOV2 24 * COM7 16 * RES3F 15 * IND1 13 * EDU2 11 * IND4 11 * IND5 6 * COM2 3 * COM10 3 * COM6 2 * IND6 2 * COM5 1 The BRAILS occupancy system include the following classes: * RES1 * RES3 * COM To compare these two systems, we renamed some NJDEP labels: * RES1 -> RES1 * RES3A -> RES3 * RES3B -> RES3 * RES3C -> RES3 * RES3D -> RES3 * RES3F -> RES3 * RES3E -> RES3 * COM1 -> COM * COM2 -> COM * COM3 -> COM * COM4 -> COM * COM5 -> COM * COM6 -> COM * COM7 -> COM * COM8 -> COM * COM10 -> COM From the relabelled records, we selected the following for validation: * RES1, 1,000 randomly selected from RES1 * RES3, 1,000 randomly selected from RES3 * COM, 1,000 randomly selected from COM You can download the labels, images, scripts for this validation from `here `_. The following shows the script to run this validation. At the end, the script will plot a confusion matrix and print the accuracy. .. code-block:: python import pandas as pd data = pd.read_csv("AtlanticCountyBuildingInventory.csv") data.describe() def getCls(x): if 'RES1' in x: return 'RES1' elif 'RES3' in x: return 'RES3' elif 'COM' in x: return 'COM' else: return 'remove' data['occupancy']=data['OccupancyClass'].apply(lambda x: getCls(x)) #data=data[data['occupancy']!='remove'] RES1 = data[data['occupancy']=='RES1'].sample(n=1000, random_state = 1993) RES3 = data[data['occupancy']=='RES3'].sample(n=1000, random_state = 1993) COM = data[data['occupancy']=='COM'].sample(n=1000, random_state = 1993) data = pd.concat([RES1,RES3,COM]) # ### Use BRAILS to download street view images import sys sys.path.append("/Users/simcenter/Codes/SimCenter/BIM.AI") from brails.workflow.Images import getGoogleImagesByAddrOrCoord addrs = list(data[['Longitude','Latitude']].to_numpy()) getGoogleImagesByAddrOrCoord(Addrs=addrs, GoogleMapAPIKey='Your-Key', imageTypes=['StreetView'],imgDir='tmp/images',ncpu=2, fov=60,pitch=0,reDownloadImgs=False) data['StreetViewImg']=data.apply(lambda row: f"tmp/images/StreetView/StreetViewx{'%.6f'%row['Longitude']}x{'%.6f'%row['Latitude']}.png", axis=1) import os import shutil # Remove empty images data = data[data['StreetViewImg'].apply(lambda x: os.path.getsize(x)/1024 > 9)] # Remove duplicates data.drop_duplicates(subset=['StreetViewImg'], inplace=True) # ### Predict from brails.modules import OccupancyClassifier occupancyModel = OccupancyClassifier() occupancyPreds = occupancyModel.predict(list(data['StreetViewImg'])) data['Occupancy(BRAILS)']=list(occupancyPreds['prediction']) data['prob_Occupancy(BRAILS)']=list(occupancyPreds['probability']) # ### Plot confusion matrix import sys import matplotlib.pyplot as plt get_ipython().run_line_magic('matplotlib', 'inline') sys.path.append(".") from plotUtils import plot_confusion_matrix from sklearn.metrics import confusion_matrix from sklearn.metrics import f1_score,accuracy_score,f1_score class_names = list(data['Occupancy(BRAILS)'].unique()) predictions = data['Occupancy(BRAILS)'] labels = data['occupancy'] cnf_matrix = confusion_matrix(labels,predictions,labels=class_names) plot_confusion_matrix(cnf_matrix, classes=class_names, normalize=True,xlabel='BRAILS',ylabel='NJDEP') for i,cname in enumerate(class_names): accuracy = '%.1f'%(cnf_matrix[i][i]/sum(cnf_matrix[i])) TP = cnf_matrix[i][i] FP = sum(cnf_matrix[:,i])-cnf_matrix[i,i] FN = sum(cnf_matrix[i,:])-cnf_matrix[i,i] F1 = '%.1f'%(TP/(TP+0.5*(FP+FN))) print(f'{cname}: Accuracy = {accuracy}, F1 = {F1}') # ### Copy images to directories {label}-{prediction} for inspection import os import shutil predDir = 'tmp/images/occupancy_predictions' if not os.path.exists(predDir): os.makedirs(predDir) falseNames = [] def copyfiles(bim): for ind, row in bim.iterrows(): label = row['occupancy'] pred = row['Occupancy(BRAILS)'] lon, lat = '%.6f'%row['Longitude'], '%.6f'%row['Latitude'] oldfile = f'tmp/images/StreetView/StreetViewx{lon}x{lat}.png' newfile = f'{predDir}/{label}-{pred}/StreetViewx{lon}x{lat}.png' thisFileDir = f'{predDir}/{label}-{pred}/' if not os.path.exists(thisFileDir): os.makedirs(thisFileDir) try: shutil.copyfile(oldfile, newfile) except: print(oldfile) copyfiles(data) In the files you downloaded, there are folders with names like RES-COM, which means those are images that are labelled as 'RES' in NJDEP dataset, but they are predicted as 'COM'. You can browse through those images to investigate deeper. The confusion matrix tested on this dataset is shown in :numref:`fig_confusion_occupancy_njdep_v2`. .. _fig_confusion_occupancy_njdep_v2: .. figure:: ../../images/technical/njdep/fig_confusion_occupancy_njdep_v2.png :width: 40% :alt: Confusion matrix occupancy NJDEP Confusion matrix - Occupancy type classification for NJDEP The accuracy for the two classes are: * RES1: Accuracy = 0.89, F1 = 0.86 * RES3: Accuracy = 0.92, F1 = 0.83 * COM: Accuracy = 0.67, F1 = 0.79 Examples of false predictions are shown in :numref:`atlantic_occupancy_examples_njdep_falsev2`. .. _atlantic_occupancy_examples_njdep_falsev2: .. list-table:: Example of false predictions * - .. figure:: ../../images/technical/njdep/false/RES1-RES3/StreetViewx-74.366315x39.422974.png Label: RES1, BRAILS Prediction: RES3 - .. figure:: ../../images/technical/njdep/false/RES1-RES3/StreetViewx-74.366873x39.420778.png Label: RES1, BRAILS Prediction: RES3 - .. figure:: ../../images/technical/njdep/false/RES3-COM/StreetViewx-74.417573x39.372665.png Label: RES3, BRAILS Prediction: COM - .. figure:: ../../images/technical/njdep/false/RES3-COM/StreetViewx-74.418175x39.369580.png Label: RES3, BRAILS Prediction: COM .. note:: Bias in dataset is very common. This validation doesn't consider the possible bias in the labels (examples can be found in :numref:`njdep_occupancy_examples_biasv2`), which also negatively influences the accuracy. .. _njdep_occupancy_examples_biasv2: .. list-table:: Example of street view images: Bias in the labels * - .. figure:: ../../images/technical/njdep/RES1-COM/StreetViewx-74.544719x39.459546.png Label: RES1, BRAILS Prediction: COM - .. figure:: ../../images/technical/njdep/RES1-RES3/StreetViewx-74.358387x39.411702.png Label: RES1, BRAILS Prediction: RES3 - .. figure:: ../../images/technical/njdep/RES3-COM/StreetViewx-74.412689x39.368096.png Label: RES3, BRAILS Prediction: COM - .. figure:: ../../images/technical/njdep/RES3-RES1/StreetViewx-74.406136x39.382882.png Label: RES3, BRAILS Prediction: RES1