Description | Reference, implementation reference |
---|---|
Automatic techniques | |
Paper introducing TensorFlow framework, explains how Tensorflow is designed for large datasets, including distributed representations | |
Seminal work defines automatic embedding technique for natural language processing, transform natural language words to one-hot encoded vectors, and learn lower-dimensional embedded values | [27], N/A |
Use embedding layers for categorical variables to predict taxi destination | |
Entity embedding categorical data from computing system events for intrusion detection | [30], N/A |
Widely used Keras library, provides embedding layer implementation | |
\(\text {BERT}\) leverages word piece embeddings, can be used in a feature-based approach to learn a representation of natural language, shown to work well for named entity recognition tasks | |
Survey of techniques for graph embedding, Python package with examples for techniques covered, | |
Use Keras embedding layer for entity embedding of categorical values, won third place in a Kaggle competition, map One-hot encodings of categorical data to lower dimensional vectors | |
Library for automatic embeddings, part of fast.ai framework, supports PyTorch neural networks library | |
Concatenate embeddings of categorical data to time-series data for input to neural network | [38], N/A |
Comparison of several pre-computed embedding techniques employing transfer learning, cited in documentation for Pytorch embedding layer | [39], N/A |
Learn patient representation from \(\text {EHR}\) data such as \(\text {ICD-10}\) codes, using denoising autoencoder for transfer learning to use learned representation for various tasks | [40], N/A |
Survey of deep learning techniques for bioinformatic data use-cases, Examples of applying graph embeddings of proteins to predict protein-protein interactions, and recurrent and convolutional neural networks for predicting expression of one-hot encoded \(\mathrm{DNA}\) sequences, authors provide source code for techniques they cover | |
Word2vec algorithm breakthrough automatic technique for natural language processing, transform natural language data to embedded vectors, suitable for transfer learning | |
Apply word2vec to corpus of protein sequences to obtain lower dimensional representation, for transfer learning in various machine learning algorithms | [45], N/A |
Natural language processing algorithm, uses word co-occurrence matrix for embedding qualitative data in lower dimensional space, suitable for transfer learning | |
Transfer \(\text {GloVe}\) embedding for sentiment analysis to classify emotion, introduces weighting scheme for imbalanced data | [47], N/A |
Use GRU’s to learn representations of clinical descriptions, representations fed into dense layers that act as classifiers to produce \(\text {ICD-10}\) codes | |
Embed One-hot encoded vectors for categorical data, feed as input to a log-bilinear neural model for unsupervised anomaly detection in 12 datasets | [50], N/A |
Use One-hot encoding of demographic and diagnostic data plus embedding of medical coding (\(\text {ICD-9}\)) with Long Short-term Memory (LSTM) to predict hospital re-admission, ICD-9 embedding transferred from previous work using Word2vec variant to embed ICD-9 codes | |
Embedding technique that Lin et al. leverage in [51], leverage Word2vec algorithm to encode \(\text {ICD-9}\) codes as real-valued vectors | |
Algorithmic techniques | |
Predict patient mortality, categorical inputs include ICD-9, Current Procedural Terminology (CPT), and RxNorm Codes; value counts are used to encode data, authors claim techniques are suitable for big data | [55], N/A |
\(\text {GEL}\) Pre-processing technique for embedding qualitative data for convolutional neural networks, authors claim suitable for any type of categorical variable, and datasets with both numerical and categorical variables | |
\(\text {EDLT}\) pre-processing technique leveraging Pearson correlation for converting tabular data to matrix form for convolutional neural networks | |
Use loss function to optimize parameters of hash algorithm for mapping One-hot encoding of data to lower dimensional vectors | |
Use Latent Dirichlet Allocation (\(\text {LDA}\)) to extract features from text of automobile insurance claims, then use features from \(\text {LDA}\) as input to neural network | [18], N/A |
Determined techniques | |
Experiment to compare encoding techniques, all encoding techniques mentioned are available in Python Scikit-learn Category encoders librar | [12], N/A |
Use One-hot encoding for low-cardinality categorical variables to compute loan risk | [61], N/A |
Use leave-one-out encoding for categorical variables as input to convolutional neural network for computer network intrusion detection system | [62], N/A |