- Keypoint or Pose detection:
keypoints are found using a CNN wheren
= max number of keypoints present
from fastai.vision.all import *
Cleaning Some Data:
For our dataset, we will be working from the Kaggle Cats dataset. Now we are purposefully going to go about cleaning our data beforehand so we understand what it is like
url = "https://drive.google.com/uc?id=1ffJr3NrYPqzutcXsYIVNLXzzUaC9RqYM"
!gdown {url}
Now that it's downloaded, let's unzip it using ZipFile
from zipfile import ZipFile
with ZipFile('cat-dataset.zip', 'r') as zip_ref:
How is the data stored? Let's talk a look by walking on our folders:
import os
[x[0] for x in os.walk('cats')]
We have some duplicate folders, let's get rid of the CAT_
directory and just work out of the cats
for i in range(7):
path = Path(f'CAT_0{i}')
Now we need to move all the files up one level. We can use pathlib
for i in range(7):
paths = Path(f'cats/CAT_0{i}').ls()
for path in paths:
p = Path(path).absolute()
par = p.parents[1]
How is our data labeled? Our keypoints are available via an image's corresponding .cat
file. Let's make sure we have an equal number of labels to images
path = Path('cats')
lbls = get_files(path, extensions='.cat')
imgs = get_image_files(path)
test_eq(len(lbls), len(imgs))
We're good to go! - Or are we
Let's first grab a label based on a file name
def img2kpts(f): return f'{str(f)}.cat'
Let's try this out on an image
fname = imgs[0]
img = PILImage.create(fname)
Now let's grab some coordinates!
kpts = np.genfromtxt(img2kpts(fname)); kpts
Wait, that's not our keypoints. What is this?
It is, go back to the Kaggle and they describe how it is done. The number of points by default are 9 (the first value in our list):
- Left eye
- Right eye
- Mouth
- Left ear 1
- Left ear 2
- Left ear 3
- Right ear 1
- Right ear 2
- Right ear 3
Now we need to seperate our keypoints into pairs and a tensor
def sep_points(coords:array):
"Seperate a set of points to groups"
kpts = []
for i in range(1, int(coords[0]*2), 2):
kpts.append([coords[i], coords[i+1]])
return tensor(kpts)
pnts = sep_points(kpts); pnts
Now let's put it all together. We need to return some TensorPoints
to have it work in fastai
First let's take what we did above and make a get_y
def get_y(f:Path):
"Get keypoints for `f` image"
pts = np.genfromtxt(img2kpts(f))
return sep_points(pts)
Now there is one more bit of cleaning we need to do, and that is to make sure all my points are within the bounds of my image. But how do I do this? Let's write a list of bad_fnames
in which we run the following test:
- Open an image and the points
- If any point is outside the image, remove the file
- If any point is negative, remove the file
bad_imgs = []
for name in imgs:
im = PILImage.create(name)
y = get_y(name)
for x in y:
if x[0] < im.size[0]:
if x[0] < 0:
if x[1] < im.size[1]:
if x[1] < 0:
Let's take a look at how many bad images we had!
That's a lot! But couldn't we also get repeats from the above code? Let's check for that
We could. So in total we have 1,062 images who's points go out of bounds. There's a few different ways we can deal with this.
- Remove said image
- Zero those points to (-1,-1) (through a transform
- Keep the points
Each have their benefits. We'll do #1
for name in list(set(bad_imgs)):
Now that we've removed all the bad images, let's continue
imgs = get_image_files(path)
fname = imgs[0]
img = PILImage.create(fname)
Now let's get our TensorPoints
, just to show an example
def get_ip(img:PILImage, pts:array): return TensorPoint(pts, sz=img.size)
ip = get_y(fname); ip
tp = get_ip(img, ip)
Now we can visualize our points. We can pass in an axis to overlay them on top of our image
ax = img.show(figsize=(12,12))
Great, let's datablock it now. We'll keep the transforms very simple for our example problem to just a Resize
We'll also talk about the "Clamping" option. This can be useful for more than our ground truth. Consider this case:
- RandomResizeCrop crops our image and makes the point go off-screen. Do we keep the point and let the model guess, or adjust that point?
Let's build transform to adjust for this (our dataset will not use this but this is for an example) as we will use either padding or squishing on our resize
class ClampBatch(Transform):
"Clamp points to a minimum and maximum in a batch"
order = 4
def __init__(self, min=-1, max=1, **kwargs):
self.min, self.max = min, max
def encodes(self, x:(TensorPoint)):
for i, sets in enumerate(x):
for j, pair in enumerate(sets):
cpnt = torch.clamp(pair, self.min, self.max)
if any(cpnt>=1) or any(cpnt<=-1):
x[i][j] = tensor([-1,-1])
return x
We'll go ahead and run ClampBatch
here though just in case something happens during one of our transforms (though most likely not)
item_tfms = [Resize(448, method='squish')]
batch_tfms = [Flip(), Rotate(), Zoom(), Warp(), ClampBatch()]
All Keypoint agumentation available:
- Rotate, FlipItem, DihedralItem
- CropPad, RandomCrop, Resize, RandomResizeCrop
- Zoom, Warp
dblock = DataBlock(blocks=(ImageBlock, PointBlock),
With how our get_y
is, we will want a base path of ''. Let's look at a dblock.summary()
output now to see what's going on in our Pipeline
This is cool! It shows us exactly what is happenning when, and if a transform doesn't work we can go through and adjust and debug through it! Since it was successfull in building a batch, let's build our DataBunch
dls = dblock.dataloaders('', path='', bs=bs)
dls.show_batch(max_n=8, figsize=(12,12))
Great, now let's give it a .c
attribute so we can create our model's output easier
dls.c = dls.train.after_item.c
Now let's go through and generate a custom model and head for regression, but how do we do this?
If we know our outputs and inputs, we can make use of two functions, create_body
, and create_head
. create_body
will chop the top of our pre-trained model for us, and create_head
will make a special fastai
configured head that has shown to work better. What will we need?
- Outputs: 18 (9 pairs of points)
- Inputs: 1024 filter (2x the last ResNet18 layer)
body = create_body(resnet18, pretrained=True)
Now let's make our head
head = create_head(nf=1024, n_out=18); head
Finally, we'll wrap them together
arch = nn.Sequential(body, head)
Now we want to utilize transfer learning as best as we can, so we need those discrimitive learning rates. How do we do this? We need to define what this should look like. Here is what it is for our resnet (this is to your discretion based on what you find)
def _resnet_split(m): return L(m[0][:6], m[0][6:], m[1:]).map(params)
We can see that the [1:]
grabs the head of our model, and we can safely freeze the body
Lastly we want to initialize our model
apply_init(arch[1], nn.init.kaiming_normal_)
For our loss funciton, we will use MSELossFlat
Now that we have all the pieces, let's make a model!
learn = Learner(dls, arch, loss_func=MSELossFlat(), splitter = _resnet_split,
And let's fit!
learn.fit_flat_cos(5, 1e-2)
Let's look at some of the initial results
Alright, we're getting there! Let's unfreeze and find a new learning rate
learn.fit_flat_cos(5, 1e-4)
Looks much better! And in only ten epochs! (most anything I've seen with this sort of problem in PyTorch
fits for 100+ epochs)