{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quickstart"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"The example below shows the basic usage of TSFuse."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data format"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"The input of TSFuse is a dataset where each instance is a window that consists of multiple time series and a label."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Time series"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"Time series are represented using a dictionary where each entry represents a univariate or multivariate time series. As an example, let's create a dictionary with two univariate time series:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-09T11:55:30.629106Z",
"start_time": "2019-12-09T11:55:29.365104Z"
}
},
"outputs": [],
"source": [
"from pandas import DataFrame\n",
"from tsfuse.data import Collection\n",
"X = {\n",
" \"x1\": Collection(DataFrame({\n",
" \"id\": [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],\n",
" \"time\": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],\n",
" \"data\": [1, 2, 3, 1, 2, 3, 3, 2, 1, 3, 2, 1],\n",
" })),\n",
" \"x2\": Collection(DataFrame({\n",
" \"id\": [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],\n",
" \"time\": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],\n",
" \"data\": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],\n",
" })),\n",
"}"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"The two univariate time series are named `x1` and `x2` and each series is represented as a :class:`~tsfuse.data.Collection` object. Each ``Collection`` is initialized with a DataFrame that has three columns:\n",
"\n",
"- `id` which is the identifier of each instance, i.e., each window,\n",
"- `time` which contains the time stamps,\n",
"- `data` contains the time series data itself.\n",
"\n",
"For multivariate time series data, there can be multiple columns similar to the `data` column. For example, the data of a tri-axial accelerometer would have three columns `x`, `y`, `z` instead of `data` as it simultaneously measures the `x`, `y`, `z` acceleration."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Labels"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"There should be one target value for each window, so we create a `Series` where the index contains all unique `id` values of the time series data and the data consists of the labels:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-09T11:55:30.634146Z",
"start_time": "2019-12-09T11:55:30.631109Z"
}
},
"outputs": [],
"source": [
"from pandas import Series\n",
"y = Series(index=[0, 1, 2, 3], data=[0, 0, 1, 1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Feature construction"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"To construct features, TSFuse provides a :meth:`~tsfuse.construct` function which takes time series data `X` and target data `y` as input, and returns a `DataFrame` where each column corresponds to a feature. In addition, this function can return a computation graph which contains all transformation steps required to compute the features for new data:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-09T11:55:30.726204Z",
"start_time": "2019-12-09T11:55:30.636771Z"
}
},
"outputs": [],
"source": [
"from tsfuse import construct\n",
"features, graph = construct(X, y, transformers=\"minimal\", return_graph=True)"
]
},
{
"cell_type": "raw",
"metadata": {
"raw_mimetype": "text/restructuredtext"
},
"source": [
"The DataFrame with the constructed features looks like this:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2019-12-09T11:55:30.749748Z",
"start_time": "2019-12-09T11:55:30.728227Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"