Now let's grab what we need:
from fastai.tabular.all import *
from tsai.all import *
For our data we'll be utilizing the UCR repository which has 128 univariate and 30 multivariate datasets. In the framework we can quickly grab any dataset we want:
name = 'StarLightCurves'
We can now grab our train and validation by calling get_UCR_data
:
X_train, y_train, X_valid, y_valid = get_UCR_data(name, verbose=True, on_disk=True)
Since data is already split between train and test in the UCR dataset, we are going to merge them and create some indices to split them in the same sets. To save on memory, he figured out a way to utilized the numpy arrays via your disk. We'll do so as follows:
X = np.concatenate((X_train, X_valid))
y = np.concatenate((y_train, y_valid))
np.save('./data/UCR/StarLightCurves/X.npy', X)
np.save('./data/UCR/StarLightCurves/y.npy', y)
del X, y
Now we can load them back in and make our splits:
X = np.load('./data/UCR/StarLightCurves/X.npy', mmap_mode='r')
y = np.load('./data/UCR/StarLightCurves/y.npy', mmap_mode='r')
splits = (L(np.arange(len(X_train)), use_list=True),
L(np.arange(len(X_train), len(X)), use_list=True))
splits
Since we used memmap the data is being directly read from memory. Now to make and use your own data they need to be in a three dimentional array with a format of:
- Samples
- Variables
- Length (or timesteps)
To use this we have a special TSTensor
built to handle such data:
t = TSTensor(X)
t
So here we can see we had 60 samples with one variable and an overall time length of 570 for each
We can directly use this in our DataBlock
API too! With a TSTensorBlock
:
splitter = IndexSplitter(splits[1])
getters = [ItemGetter(0), ItemGetter(1)]
dblock = DataBlock(blocks=(TSTensorBlock, CategoryBlock),
getters=getters,
splitter=splitter)
Next we make a call to itemify
which zips up our x's and our y's
src = itemify(X, y)
And now we can make our DataLoaders
:
dls = dblock.dataloaders(src, bs=64, val_bs=128)
dls.show_batch(max_n=3)
And there's our data! Now are we still on disk? Yes
dls.dataset[0]
Training our Model
The particular model we're using is the Inception Time model. To do so we need the number of input classes and our number of variables:
dls.c
inp_vars = dls.dataset[0][0].shape[-2]
inp_vars
net = InceptionTime(inp_vars, dls.c)
learn = Learner(dls, net, loss_func=CrossEntropyLossFlat(), metrics=accuracy, opt_func=ranger)
learn.lr_find()
And now we can fit!
learn.fit_flat_cos(10, 0.025)