2017-ICDM Generating synthetic time series to augment sparse datasets
Motivation
A new synthetic series genaration method was proposed, and the average time series based on the extention of DBA(Dynamic Barycenter Averaging) was as a synthetic sample.
Methods
The paper firstly anlysed two shortcomings using the original DBA to generate synthetic samples, which contained as below:
- Modifying the starting time series(that is the initial random average series in DBA) is not sufficient to create enough diversity in the synthesized dataset;
- The raw DBA method which utilizes the “uniformly weighted” schedule may generate undesirable samples when the distribution is not spherical. Such as for “U-shape” distribution, computing the center of this U will typically construct very unlikely objects.
The main contributes of this paper were to adress two problems above through designing a weighted schedule(for the second problem) based on batch series rather than all samples in DBA to induce the diversity(for the first problem). The weighted mechanism was introdued as follow:
Fuether, three extentions of DBA concentrating on “Batch” were proposed, and which were introduced as follow detailedly:
- Average All (AA)
- Average Selected (AS)
- Average Selected with Distance (ASD)
It adds the density property to distribution based on distance to Nearst Neighbor.
Results
1、ASD is better than AA
2、Increasing the number of available examples per class can strongthen model.