Here we deal with a single leaf image and we have to predict wether the leaf is healthy, has multiple diseases, has rust, has scab.
So one input image and 4 columns to predict.
In the evaluation we have For each image_id in the test set, you must predict a probability for each target variable.
so we'll set it up as a regression problem.
Getting the data:
The data is available here.
from fastai.vision.all import *
import pandas as pd
Download and unzip your data to a folder called plant
path= 'plant/'
Let us see what is there in train.csv
train = pd.read_csv(path+'train.csv')
train.head()
We need to create a tuple is (x,y)
for our model to train. So we'll create like this (image_id, [healthy multiple_diseases rust scab])
Let's create a new column combined
which is a list of the dependent variables
train['combined'] = train[['healthy','multiple_diseases','rust','scab']].values.tolist()
train.head()
For show_batch
to work we need to add the ability for a list to have show_title
class TitledList(list, ShowTitle):
_show_args = {'label': 'text'}
def show(self, ctx=None, **kwargs):
"Show self"
return show_title(self, ctx=ctx, **merge(self._show_args, kwargs))
class ToListTensor(DisplayedTransform):
"Transform to int tensor"
# order = 10 #Need to run after PIL transforms on the GPU
_show_args = {'label': 'text'}
def __init__(self, split_idx=None,):
super().__init__(split_idx=split_idx)
def encodes(self, o): return o
def decodes(self, o): return TitledList(o)
Independent variable is the image we'll use a ImageBlock.
Dependent varaible we'll use a RegressionBlock, here we need to set c_out
.
And we add ToListTensor
to the get_y
blocks = [ImageBlock, RegressionBlock(c_out=4)]
item_tfms = [Resize(150)];# size should be bigger
batch_tfms = [*aug_transforms(flip_vert=True,size=(128)), Normalize.from_stats(*imagenet_stats)]
splitter = RandomSplitter()
plant = DataBlock(blocks =blocks,
get_x = ColReader('image_id', pref=f'gdrive/My Drive/kaggle/plant/images/',suff='.jpg'),
get_y = Pipeline([ColReader('combined'),ToListTensor]),
splitter =splitter,
item_tfms=item_tfms,
batch_tfms = batch_tfms,
n_inp = 1
)
dls = plant.dataloaders(train)
dls.show_batch(nrows=2,ncols=2,figsize=(10,10))
plant.summary(train)
key things to notice:
[0, 0, 0, 1] becomes tensor([0., 0., 0., 1.])
dls.c = 4
model = resnet18
Choose an appropriate Loss function and accuracy for a regression problem
learn = cnn_learner(dls, model, metrics=[MSELossFlat()], loss_func=L1LossFlat(),y_range=(0,1),)
learn.fine_tune(2)
test_img = pd.read_csv(path+'test.csv')
dl = learn.dls.test_dl(test_img)
probs,_ = learn.get_preds(dl=dl)
p1 = pd.DataFrame(probs,columns=[['healthy','multiple_diseases','rust','scab']])
p1['image_id'] = test_img.image_id
cols = ['image_id','healthy','multiple_diseases','rust','scab']
p1[cols].head()