Hyper parameter | CRF | LSTM-CRF | ||
---|---|---|---|---|
Final | Range | Final | Range | |
Learning rate | 0.1 | \(\{0.01,0.1\}\) | 0.1 | \(\{0.01,0.1\}\) |
Mini-batch size | 32 | \(\{32,128\}\) | 128 | \(\{32,128\}\) |
Word dropout | 0.05 | – | 0.05 | – |
Variational dropout | 0.5 | – | 0.5 | – |
Type of optimizer | SGD | \(\{\text {Adam}, \text {SGD}\}\) | SGD | \(\{\text {Adam}, \text {SGD}\}\) |
LSTM layers | – | – | 1 | \(\{1,2\}\) |
LSTM state size | – | – | 256 | \(\{128,256\}\) |