Quickstart
The example below shows the basic usage of TSFuse.
Data format
The input of TSFuse is a dataset where each instance is a window that consists of multiple time series and a label.
Time series
Time series are represented using a dictionary where each entry represents a univariate or multivariate time series. As an example, let’s create a dictionary with two univariate time series:
[1]:
from pandas import DataFrame
from tsfuse.data import Collection
X = {
"x1": Collection(DataFrame({
"id": [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
"time": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
"data": [1, 2, 3, 1, 2, 3, 3, 2, 1, 3, 2, 1],
})),
"x2": Collection(DataFrame({
"id": [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
"time": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
"data": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
})),
}
The two univariate time series are named x1 and x2 and each series is represented as a Collection
object. Each Collection
is initialized with a DataFrame that has three columns:
id which is the identifier of each instance, i.e., each window,
time which contains the time stamps,
data contains the time series data itself.
For multivariate time series data, there can be multiple columns similar to the data column. For example, the data of a tri-axial accelerometer would have three columns x, y, z instead of data as it simultaneously measures the x, y, z acceleration.
Labels
There should be one target value for each window, so we create a Series where the index contains all unique id values of the time series data and the data consists of the labels:
[2]:
from pandas import Series
y = Series(index=[0, 1, 2, 3], data=[0, 0, 1, 1])
Feature construction
To construct features, TSFuse provides a construct()
function which takes time series data X and target data y as input, and returns a DataFrame where each column corresponds to a feature. In addition, this function can return a computation graph which contains all transformation steps required to compute the features for new data:
[3]:
from tsfuse import construct
features, graph = construct(X, y, transformers="minimal", return_graph=True)
The DataFrame with the constructed features looks like this:
[4]:
features
[4]:
Max(Diff(Input(x1)), axis=time) | Mean(Diff(Input(x1)), axis=time) | Median(Diff(Input(x1)), axis=time) | Min(Diff(Input(x1)), axis=time) | Sum(Diff(Input(x1)), axis=time) | |
---|---|---|---|---|---|
0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 |
1 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 |
2 | -1.0 | -1.0 | -1.0 | -1.0 | -2.0 |
3 | -1.0 | -1.0 | -1.0 | -1.0 | -2.0 |
And this is the corresponding computation graph:
[5]:
graph
[5]:
To apply this computation graph, simply call transform()
with a time series dictionary X as input:
[6]:
graph.transform(X)
[6]:
Max(Diff(Input(x1)), axis=time) | Mean(Diff(Input(x1)), axis=time) | Median(Diff(Input(x1)), axis=time) | Min(Diff(Input(x1)), axis=time) | Sum(Diff(Input(x1)), axis=time) | |
---|---|---|---|---|---|
0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 |
1 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 |
2 | -1.0 | -1.0 | -1.0 | -1.0 | -2.0 |
3 | -1.0 | -1.0 | -1.0 | -1.0 | -2.0 |