Skip to main content

Table 1 Works Studied

From: Survey on categorical data for neural networks

Description

Reference, implementation reference

Automatic techniques

 Paper introducing TensorFlow framework, explains how Tensorflow is designed for large datasets, including distributed representations

[25, 26]

 Seminal work defines automatic embedding technique for natural language processing, transform natural language words to one-hot encoded vectors, and learn lower-dimensional embedded values

[27], N/A

 Use embedding layers for categorical variables to predict taxi destination

[28, 29]

 Entity embedding categorical data from computing system events for intrusion detection

[30], N/A

 Widely used Keras library, provides embedding layer implementation

[20, 31]

 \(\text {BERT}\) leverages word piece embeddings, can be used in a feature-based approach to learn a representation of natural language, shown to work well for named entity recognition tasks

[32, 33]

 Survey of techniques for graph embedding, Python package with examples for techniques covered,

[16, 34]

 Use Keras embedding layer for entity embedding of categorical values, won third place in a Kaggle competition, map One-hot encodings of categorical data to lower dimensional vectors

[3, 35]

 Library for automatic embeddings, part of fast.ai framework, supports PyTorch neural networks library

[36, 37]

 Concatenate embeddings of categorical data to time-series data for input to neural network

[38], N/A

 Comparison of several pre-computed embedding techniques employing transfer learning, cited in documentation for Pytorch embedding layer

[39], N/A

 Learn patient representation from \(\text {EHR}\) data such as \(\text {ICD-10}\) codes, using denoising autoencoder for transfer learning to use learned representation for various tasks

[40], N/A

 Survey of deep learning techniques for bioinformatic data use-cases, Examples of applying graph embeddings of proteins to predict protein-protein interactions, and recurrent and convolutional neural networks for predicting expression of one-hot encoded \(\mathrm{DNA}\) sequences, authors provide source code for techniques they cover

[41, 42]

 Word2vec algorithm breakthrough automatic technique for natural language processing, transform natural language data to embedded vectors, suitable for transfer learning

[43, 44]

 Apply word2vec to corpus of protein sequences to obtain lower dimensional representation, for transfer learning in various machine learning algorithms

[45], N/A

 Natural language processing algorithm, uses word co-occurrence matrix for embedding qualitative data in lower dimensional space, suitable for transfer learning

[21, 46]

 Transfer \(\text {GloVe}\) embedding for sentiment analysis to classify emotion, introduces weighting scheme for imbalanced data

[47], N/A

 Use GRU’s to learn representations of clinical descriptions, representations fed into dense layers that act as classifiers to produce \(\text {ICD-10}\) codes

[48, 49]

 Embed One-hot encoded vectors for categorical data, feed as input to a log-bilinear neural model for unsupervised anomaly detection in 12 datasets

[50], N/A

 Use One-hot encoding of demographic and diagnostic data plus embedding of medical coding (\(\text {ICD-9}\)) with Long Short-term Memory (LSTM) to predict hospital re-admission, ICD-9 embedding transferred from previous work using Word2vec variant to embed ICD-9 codes

[51, 52]

 Embedding technique that Lin et al. leverage in [51], leverage Word2vec algorithm to encode \(\text {ICD-9}\) codes as real-valued vectors

[53, 54]

Algorithmic techniques

 Predict patient mortality, categorical inputs include ICD-9, Current Procedural Terminology (CPT), and RxNorm Codes; value counts are used to encode data, authors claim techniques are suitable for big data

[55], N/A

 \(\text {GEL}\) Pre-processing technique for embedding qualitative data for convolutional neural networks, authors claim suitable for any type of categorical variable, and datasets with both numerical and categorical variables

[19, 56]

 \(\text {EDLT}\) pre-processing technique leveraging Pearson correlation for converting tabular data to matrix form for convolutional neural networks

[57, 58]

 Use loss function to optimize parameters of hash algorithm for mapping One-hot encoding of data to lower dimensional vectors

[59, 60]

 Use Latent Dirichlet Allocation (\(\text {LDA}\)) to extract features from text of automobile insurance claims, then use features from \(\text {LDA}\) as input to neural network

[18], N/A

Determined techniques

 Experiment to compare encoding techniques, all encoding techniques mentioned are available in Python Scikit-learn Category encoders librar

[12], N/A

 Use One-hot encoding for low-cardinality categorical variables to compute loan risk

[61], N/A

 Use leave-one-out encoding for categorical variables as input to convolutional neural network for computer network intrusion detection system

[62], N/A