Quickstart

The example below shows the basic usage of TSFuse.

Data format

The input of TSFuse is a dataset where each instance is a window that consists of multiple time series and a label.

Time series

Time series are represented using a dictionary where each entry represents a univariate or multivariate time series. As an example, let’s create a dictionary with two univariate time series:

[1]:
from pandas import DataFrame
from tsfuse.data import Collection
X = {
    "x1": Collection(DataFrame({
        "id":   [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
        "time": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        "data": [1, 2, 3, 1, 2, 3, 3, 2, 1, 3, 2, 1],
    })),
    "x2": Collection(DataFrame({
        "id":   [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
        "time": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
        "data": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
    })),
}

The two univariate time series are named x1 and x2 and each series is represented as a Collection object. Each Collection is initialized with a DataFrame that has three columns:

  • id which is the identifier of each instance, i.e., each window,

  • time which contains the time stamps,

  • data contains the time series data itself.

For multivariate time series data, there can be multiple columns similar to the data column. For example, the data of a tri-axial accelerometer would have three columns x, y, z instead of data as it simultaneously measures the x, y, z acceleration.

Labels

There should be one target value for each window, so we create a Series where the index contains all unique id values of the time series data and the data consists of the labels:

[2]:
from pandas import Series
y = Series(index=[0, 1, 2, 3], data=[0, 0, 1, 1])

Feature construction

To construct features, TSFuse provides a construct() function which takes time series data X and target data y as input, and returns a DataFrame where each column corresponds to a feature. In addition, this function can return a computation graph which contains all transformation steps required to compute the features for new data:

[3]:
from tsfuse import construct
features, graph = construct(X, y, transformers="minimal", return_graph=True)

The DataFrame with the constructed features looks like this:

[4]:
features
[4]:
Max(Diff(Input(x1)), axis=time) Mean(Diff(Input(x1)), axis=time) Median(Diff(Input(x1)), axis=time) Min(Diff(Input(x1)), axis=time) Sum(Diff(Input(x1)), axis=time)
0 1.0 1.0 1.0 1.0 2.0
1 1.0 1.0 1.0 1.0 2.0
2 -1.0 -1.0 -1.0 -1.0 -2.0
3 -1.0 -1.0 -1.0 -1.0 -2.0

And this is the corresponding computation graph:

[5]:
graph
[5]:
_images/quickstart_17_0.svg

To apply this computation graph, simply call transform() with a time series dictionary X as input:

[6]:
graph.transform(X)
[6]:
Max(Diff(Input(x1)), axis=time) Mean(Diff(Input(x1)), axis=time) Median(Diff(Input(x1)), axis=time) Min(Diff(Input(x1)), axis=time) Sum(Diff(Input(x1)), axis=time)
0 1.0 1.0 1.0 1.0 2.0
1 1.0 1.0 1.0 1.0 2.0
2 -1.0 -1.0 -1.0 -1.0 -2.0
3 -1.0 -1.0 -1.0 -1.0 -2.0