{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quickstart" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The example below shows the basic usage of TSFuse." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data format" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The input of TSFuse is a dataset where each instance is a window that consists of multiple time series and a label." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Time series" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "Time series are represented using a dictionary where each entry represents a univariate or multivariate time series. As an example, let's create a dictionary with two univariate time series:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-12-09T11:55:30.629106Z", "start_time": "2019-12-09T11:55:29.365104Z" } }, "outputs": [], "source": [ "from pandas import DataFrame\n", "from tsfuse.data import Collection\n", "X = {\n", " \"x1\": Collection(DataFrame({\n", " \"id\": [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],\n", " \"time\": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],\n", " \"data\": [1, 2, 3, 1, 2, 3, 3, 2, 1, 3, 2, 1],\n", " })),\n", " \"x2\": Collection(DataFrame({\n", " \"id\": [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],\n", " \"time\": [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],\n", " \"data\": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],\n", " })),\n", "}" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The two univariate time series are named `x1` and `x2` and each series is represented as a :class:`~tsfuse.data.Collection` object. Each ``Collection`` is initialized with a DataFrame that has three columns:\n", "\n", "- `id` which is the identifier of each instance, i.e., each window,\n", "- `time` which contains the time stamps,\n", "- `data` contains the time series data itself.\n", "\n", "For multivariate time series data, there can be multiple columns similar to the `data` column. For example, the data of a tri-axial accelerometer would have three columns `x`, `y`, `z` instead of `data` as it simultaneously measures the `x`, `y`, `z` acceleration." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Labels" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "There should be one target value for each window, so we create a `Series` where the index contains all unique `id` values of the time series data and the data consists of the labels:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-12-09T11:55:30.634146Z", "start_time": "2019-12-09T11:55:30.631109Z" } }, "outputs": [], "source": [ "from pandas import Series\n", "y = Series(index=[0, 1, 2, 3], data=[0, 0, 1, 1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Feature construction" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To construct features, TSFuse provides a :meth:`~tsfuse.construct` function which takes time series data `X` and target data `y` as input, and returns a `DataFrame` where each column corresponds to a feature. In addition, this function can return a computation graph which contains all transformation steps required to compute the features for new data:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-12-09T11:55:30.726204Z", "start_time": "2019-12-09T11:55:30.636771Z" } }, "outputs": [], "source": [ "from tsfuse import construct\n", "features, graph = construct(X, y, transformers=\"minimal\", return_graph=True)" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "The DataFrame with the constructed features looks like this:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2019-12-09T11:55:30.749748Z", "start_time": "2019-12-09T11:55:30.728227Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Max(Diff(Input(x1)), axis=time)Mean(Diff(Input(x1)), axis=time)Median(Diff(Input(x1)), axis=time)Min(Diff(Input(x1)), axis=time)Sum(Diff(Input(x1)), axis=time)
01.01.01.01.02.0
11.01.01.01.02.0
2-1.0-1.0-1.0-1.0-2.0
3-1.0-1.0-1.0-1.0-2.0
\n", "
" ], "text/plain": [ " Max(Diff(Input(x1)), axis=time) Mean(Diff(Input(x1)), axis=time) \\\n", "0 1.0 1.0 \n", "1 1.0 1.0 \n", "2 -1.0 -1.0 \n", "3 -1.0 -1.0 \n", "\n", " Median(Diff(Input(x1)), axis=time) Min(Diff(Input(x1)), axis=time) \\\n", "0 1.0 1.0 \n", "1 1.0 1.0 \n", "2 -1.0 -1.0 \n", "3 -1.0 -1.0 \n", "\n", " Sum(Diff(Input(x1)), axis=time) \n", "0 2.0 \n", "1 2.0 \n", "2 -2.0 \n", "3 -2.0 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "And this is the corresponding computation graph:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2019-12-09T11:55:30.797028Z", "start_time": "2019-12-09T11:55:30.751377Z" } }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "\n", "('Input', 'x1')\n", "\n", "x1\n", "\n", "\n", "\n", "('Diff', (None,), (('Input', 'x1'),))\n", "\n", "Diff\n", "\n", "\n", "\n", "('Input', 'x1')->('Diff', (None,), (('Input', 'x1'),))\n", "\n", "\n", "\n", "\n", "\n", "('Input', 'x2')\n", "\n", "x2\n", "\n", "\n", "\n", "('Mean', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "Mean(axis=time)\n", "\n", "\n", "\n", "('Diff', (None,), (('Input', 'x1'),))->('Mean', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "\n", "\n", "\n", "\n", "('Median', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "Median(axis=time)\n", "\n", "\n", "\n", "('Diff', (None,), (('Input', 'x1'),))->('Median', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "\n", "\n", "\n", "\n", "('Min', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "Min(axis=time)\n", "\n", "\n", "\n", "('Diff', (None,), (('Input', 'x1'),))->('Min', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "\n", "\n", "\n", "\n", "('Max', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "Max(axis=time)\n", "\n", "\n", "\n", "('Diff', (None,), (('Input', 'x1'),))->('Max', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "\n", "\n", "\n", "\n", "('Sum', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "Sum(axis=time)\n", "\n", "\n", "\n", "('Diff', (None,), (('Input', 'x1'),))->('Sum', ('time',), (('Diff', (None,), (('Input', 'x1'),)),))\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph" ] }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/restructuredtext" }, "source": [ "To apply this computation graph, simply call :func:`~tsfuse.computation.Graph.transform` with a time series dictionary `X` as input:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-12-09T11:55:30.829811Z", "start_time": "2019-12-09T11:55:30.799057Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Max(Diff(Input(x1)), axis=time)Mean(Diff(Input(x1)), axis=time)Median(Diff(Input(x1)), axis=time)Min(Diff(Input(x1)), axis=time)Sum(Diff(Input(x1)), axis=time)
01.01.01.01.02.0
11.01.01.01.02.0
2-1.0-1.0-1.0-1.0-2.0
3-1.0-1.0-1.0-1.0-2.0
\n", "
" ], "text/plain": [ " Max(Diff(Input(x1)), axis=time) Mean(Diff(Input(x1)), axis=time) \\\n", "0 1.0 1.0 \n", "1 1.0 1.0 \n", "2 -1.0 -1.0 \n", "3 -1.0 -1.0 \n", "\n", " Median(Diff(Input(x1)), axis=time) Min(Diff(Input(x1)), axis=time) \\\n", "0 1.0 1.0 \n", "1 1.0 1.0 \n", "2 -1.0 -1.0 \n", "3 -1.0 -1.0 \n", "\n", " Sum(Diff(Input(x1)), axis=time) \n", "0 2.0 \n", "1 2.0 \n", "2 -2.0 \n", "3 -2.0 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "graph.transform(X)" ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python [conda env:tsfuse]", "language": "python", "name": "conda-env-tsfuse-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "288px" }, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }