A quick application of DL for image Classification

Data Prep / loading

Before creating an Image classifier it is necessary to gather a dataset of images representing the different categories you wish to classify. For my example I used a browser extension to download images as I browsed real-estate websites (zillow, immowelt, suumo, etc.). Next I used an ipynb I made to sort the dataset https://github.com/pickpj/img-labeler-nb/blob/main/sep.ipynb .

Next we: import libraries, check folder for images that can’t be opened, and load the images into a DataBlock. The DataBlock handles train/test split (splitter), and transformations (item_tfms). However, we will still need to have a separate test dataset, which can be done at a later time. The command du --inodes -d 2 counts the number of files (490 images), from this we can see the data is balanced between the two categories (57.8:42.2%).

from fastai.data.all import *
from fastai.vision.all import *

fp = Path("./Data")

failed = verify_images(get_image_files(fp))
failed.map(fp.unlink)

dblock = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    get_y=parent_label,
    splitter=RandomSplitter(valid_pct=0.25),
    item_tfms=[Resize(192, method="squish")]
).dataloaders(fp)

os.system("(cd Data && du --inodes -d 2)")
dblock.show_batch(max_n=9)

283 ./Interior
207 ./Exterior
491 .

Training / Fine-tuning

We then use an existing image model/pre-trained weights (arch) of weights and fine tune it to our dataset. Resnet18 is a small model the can be trained quickly with great performance. There are better models like swin_t, but the gains are marginal and training is longer. More SotA models can be found on the pytorch site https://pytorch.org/vision/stable/models.html#classification .

vismodel = vision_learner(dblock, resnet18, metrics=error_rate)
vismodel.fine_tune(3)

epoch	train_loss	valid_loss	error_rate	time
0	0.893850	0.085667	0.032787	00:26

epoch	train_loss	valid_loss	error_rate	time
0	0.096202	0.013392	0.008197	00:36
1	0.058436	0.004852	0.000000	00:35
2	0.043074	0.006278	0.000000	00:38

Results

From there the model can be exported/saved. To use the saved model there is the load_learner and .predict() functions.

vismodel.export("model.pkl")
vm = load_learner("model.pkl")

The predict function outputs the result and a tensor containing the probabilities of each category.

print(vm.predict("interior.webp"))
print(vm.predict("difficult-interior.webp"))

('Interior', tensor(1), tensor([5.3082e-05, 9.9995e-01]))
('Interior', tensor(1), tensor([0.0440, 0.9560]))

As can be seen from the output the model is able to identify both images of interior with high confidence. The “difficult-interior” image still performed well with ~95% confidence.