Readme

ML for Sustainibility

Data augmentation and transforms to prevent overfitting with e.g., random flip.
Image size we resizing is (224,224,3).
Normalizing to a mean and standard deviation. I specifically choose this mean because it’s the mean and std from the ImageNet dataset and a good representative for image datasets (source).

For every epoch, it first trains on train_loader and does validation on val_loader.
Using tqdm for a progress bar.
Default hyperparameters:
- Epochs per fold: 3
- Learning rate: 1e-4
- Loss function: CrossEntropyLoss
- Optimizer: AdamW with lr = 1e-4 and defaults betas (0.9,0.99) with 0.01 default weight decay.

Optimizations like:
- Deleting model at the end and saving best weights.
- Deleting images and labels sent to GPU at the end.
- Half precision for validation, since we don’t have to update weights here, using half precision makes it fast for fast evaluating.

Here we will select some class and then perform binary classification on it, meaning we assign 1 label to the selected class and 0 label to other classes and then take loss by CrossEntropy and perform weight update.
For every data class, we will make a train and Validation Dataloader for 3 folds and collect data like loss and accuracy to plot.

Taking images from a particular data index class and other data classes, and for binary classification, we set data index class labels as 1 and other labels as 0.
Here take is a parameter which defines how much to take from other classes. Currently, we take 10% of other classes and it works.
Setting number of workers 2 and pin memory for faster data caching.
SubsetRandomSampler for selecting indices.

Batch size is 32, since GPU we are using P100 has enough memory and bottleneck is not GPU but CPU for data caching.

We used model with AttentionBlock to emphasize the importance of image patches, to focus on important part of images and discard irrelevant parts. Although it is helpful in medical datasets but in animal datasets maybe there is a big landscape with animal in small pixels for e.g., with birds, there attention will be useful.

Convolution Blocks

There are 5 Conv Blocks each constituting:
- conv2d, batchnorm, relu, conv2d, batchnorm, relu and maxpool at the end.
These convblocks are commonly used.
BatchNorm is used to stabilize training from covariate shift (source).
ReLU is used as an activation function as used by ImageNet paper.

Hyperparameters

Attention Block

The intermediate features is the output of pool-3 or pool-4 and the global feature vector (output of pool-5) is fed as input to the attention layer.
Feature upsampling is done via bilinear interpolation to make intermediate and global feature vector same shape.
After that an element-wise sum is done followed by a convolution operation that just reduces the 256 channels to 1.
This is then fed into a Softmax layer, which gives us a normalized Attention map (A). Each scalar element in A represents the degree of attention to the corresponding spatial feature vector in F.
The new feature vector 𝐹̂ is then computed by pixel-wise multiplication. That is, each feature vector f is multiplied by the attention element a
So, the attention map A and the new feature vector 𝐹̂ are the outputs of the Attention Layer.

We initialize with Kaiming normal to prevent gradient vanishing or exploding latter on.
- Conv2d Layers: initializes the weights using the Kaiming normal.
- BatchNorm2d Layers: it sets the weights to 1 and biases to 0.
- Linear Layers: initializes the weights from a normal distribution with mean 0 and standard deviation 0.01. It sets the biases to 0.

since it has dataset requirements it can be run in kaggle
download [https://github.com/Autobot37/imageclassify/blob/main/nbs/ml-for-sustainability-srip-2024.ipynb] and import notebook there and run it.