Why More Women Should Be on Bumble than on Other Dating Apps

A couple of months after one brutal breakup, I decided it was time to put myself out there. I hadn’t moved on completely but I figured hanging out with someone new would never hurt. Just like what…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Pytorch implementation of Semantic Segmentation for Single class from scratch.

Semantic segmentation can be thought as a classification at a pixel level, more precisely it refers to the process of linking each pixel in an image to a class label. We are trying here to answer what & where is in the image, semantic segmentation is different from instance segmentation where objects of the same class will have different labels as in car1, car2 with different colours.

There are tones of repository available where semantic segmentation is available in very complex forms for multi classes. Through this blog I have tried to implement semantic segmentation from scratch for a single class, after some tweaks same should be applied for multi class.

Given a grayscale(H,W,1) or RGB(H,W,3) image we want to generate a segmentation mask which is of the same dimension as the image and consist of categorical values from 1 to N (N is the number of classes).

Categorical to One hot Encoded Labels Credit

Semantic labels consisting of categorical values can also be one hot encoded as shown in the left image.

After taking the argmax across the class and overlap with the original image we will end up with a image on the left side.

16 orientations for Single car Image

As of framework we will majorly be using Pytorch and sklearn (for train/val split).

Implementation is subdivided into 4 pipelines:-

For the entire code snippets in this blog i have tried to comment wherever it was required. We will start will importing all the required libraries.

Firstly we will convert train mask from .gif to .png , then we will resize the train and mask images to [128,128]. Here we will be using ThreadPoolExecutor for parallel operations.

as we will be using resnet back-end which is trained on imagnet we will set the mean and standard deviation of imagenet data for transformation purposes.

2. Dataloaders pipeline

In this section we will implement custom transforms , dataset and dataloader.
Starting with transforms depending on phase, if “train” then we will use horizontal flip along with Normalize and ToTensor. If “val” then we will only be useing Normalize and ToTensor.

After transform we will create a custom dataset class named CarDataset, here we fetch the original image and mask using the index id from dataloader and then apply transformation on top of that. Output from this class is image tensor of shape [3,128,128] and mask tensor [1,128,128]. For the mask tensor we have only one channel as we training only for a single class.

Mask Single channel representation [1,128,128].

Now using CarDataloader function we split the input dataframe into train dataframe and valid dataframe (only for the purpose of names). Using these dataframes we create dataloaders for training and validation.

3. Scores Pipeline

To tackle the problem of class imbalance we use Soft Dice Score instead of using pixel wise cross entropy loss. For calculating the SDS for every class we multiply the (pred score * target score) and divide by the sum of (pred²+target score²).

Inside every epoch for all the batch we calculate the dice score & append in a empty list. At the end of epoch we calculate the mean of dice scores which represent dice score for that particular epoch.

4. Training Pipeline

In this last pipeline we create a trainer class by initializing most of the values.

In the start method for every epoch first we will call iterate method for training then iterate method for validation & then learning rate scheduling. If the current validation loss is less than previous one then we save the model parameters.

In the iterate method we call forward method which calculates the loss which is then divided by accumulate steps & added to running loss. Meanwhile we keep on storing the loss gradients upto the accumulation steps in loss.grad.

After that we do the optimization step and zero the gradients once accumulation steps are reached. Lastly we will have epoch loss, dice score & will clear the cuda cache memory.

Inside the forward method we take original image & target mask send it to GPU, create a forward pass to get the prediction mask. Using the loss function we calculate the loss.

Now is the time to load the UNet architecture from smp, using resnet18 as backbone. For the number of classes we have used 1 as our mask dimension is [1,128,128].

Let the magic begin!!!!!!!!!!!!!!!

In around 6 minutes we reached dice score of 98.7 which is impressing, using the saved weights we will do inference on our validation data using the below snippets.

Left is the Predicted mask Right is the Target mask.

In may not be SOTA results but by using just 200 lines of code we get a clear idea of how semantic segmentation works. By tweaking few lines of code same can be done for multiclass labels. Please Share, Leave your comment if any.

Add a comment

Related posts:

George

He got home and opened the door with his spare key. It was an unannounced visit as he wanted to surprise those at home. The smell hit him, and he recoiled. He pinched his nose in reflex and wondered…