Skip to main content

Table 1 Related literature reviews published since 2019

From: Tabular and latent space synthetic data generation: a literature review

Reference

Data type

ML problem

Domain

Observations

[4]

Data privacy

Finance

Analysis of applications, motivation and properties of synthetic data for anonymization.

[20]

Tabular

Data privacy

Healthcare

Focus on GANs.

[21]

Tabular

Data privacy

Statistics

Focus on general definitions such as differential privacy and statistical disclosure control.

[22]

Tabular

Imbalanced learning

Various

Focus on oversampling with GANs in cybersecurity and finance.

[24]

Text

Classification

Distinguish 100 methods into 12 groups.

[25]

Text

Deep learning

General overview of text data augmentation.

[26]

Text

Few-shot learning

Augmentation techniques for machine learning with limited data

[14]

Text

Overview of augmentation techniques and applications on NLP tasks.

[27]

Text

Various

Analysis of industry use cases of data augmentation in NLP. Emphasis on input level data augmentation.

[23]

Image

Segmentation

Medicine

Analysis of algorithmic applications on a 2018 brain-tumor segmentation challenge.

[28]

Image

Imbalanced learning

Emphasis on GANs.

[13]

Image

Medicine

Emphasis on GANs.

[29]

Image

Deep learning

Regularization techniques using facial image data. Emphasis on Deep Learning generative models.

[30]

Image

Deep learning

Emphasis on data augmentation as a regularization technique.

[31]

Image

Broad overview of image data augmentation. Emphasis on traditional approaches.

[32]

Image

Various

General overview of image data augmentation and relevant domains of application.

[33]

Time series

Classification

Defined a taxonomy for time series data augmentation.

[34]

Time series

Various

Analysis of data augmentation methods for classification, anomaly detection and forecasting.

[35]

Graph

Various

Graph data augmentation for supervised and self-supervised learning.

  1. A field containing “—” indicates that the corresponding literature review does not focus on a particular data type, ML problem or domain