- Open Access
Ontology boosted deep learning for disease name extraction from Twitter messages
© The Author(s) 2018
- Received: 21 June 2018
- Accepted: 21 August 2018
- Published: 10 September 2018
This paper presents an ontology based deep learning approach for extracting disease names from Twitter messages. The approach relies on simple features obtained via conceptual representations of messages to obtain results that out-perform those from word level models. The significance of this development is that it can potentially reduce the cost of generating named entity recognition models by reducing the cost of annotating training data since ontology creation is a one-time cost as the conceptual level the ontology is meant to be fairly static and reusable. This is of great importance when it comes to social media text like Twitter messages where you have a large, unbounded lexicon with spatial and temporal variations and other inherent biases that make it logistically untenable to annotate a representative amount of text for general purpose models for live applications.
- Sentiment analysis
- Text classification
- Concept ontology
- Data mining
- Knowledge engineering
In previous years there has been massive interest in the use of unstructured web data such as tweets [1–6], web searches  and images for disease surveillance . This interest is spurred by the unprecedented growth of user generated content through social platforms like Twitter and facebook. In the studies cited it has been shown that disease reports made by users in their public social posts correlate with actual disease activity on the ground and can therefore be employed as a basis for so-called internet based syndromic surveillance. The integration of these systems into the public health surveillance process confers several advantages. First of all these systems have global reach for instance facebook, the largest service has 1.47 billion daily active users who generate unsolicited information at a rate of 293,000 status updates, 510,000 comments and 136,000 photo updates a minute  and Twitter which is the subject of this paper has about 330 million daily active users who generate more than 500 million tweets a day .
A system capable of sifting through this deluge of real time user information for mentions of disease occurrence has a very high chance of detecting outbreaks far quicker than traditional surveillance approaches moreover at a fraction of the cost due to the fact that process elements like data collection can essentially be kept external to the surveillance process. Most modern social systems allow some form of access to their data through APIs (Application programming interfaces)1 and barring the costs of API implementation the data is completely free as it is provided by unpaid users motivated by the need to communicate with each other in the form of status updates. In addition social APIs usually provide additional utilities like keyword and location based filtering making a lot easier to obtain fairly dense location specific samples for any topic of interest.
However, the vast majority of systems have been limited in scope to particular diseases such as the flu and dengue fever . The general approach entails the use of large lists of keywords in conjunction with keyword lookup approaches. To monitor multiple diseases the same technique has simply been extended by incorporating multiple definitions into hierarchical taxonomies or ontologies [12, 13]. For both cases systems rely on a priori definitions of disease events that are supplied by system implementers. True general threat detection systems like Promed-Mail and the Global Public Health Intelligence Network (GPHIN) [14, 15] have had to rely on human mediation.
However ongoing advances in machine learning and natural language processing are making it increasingly feasible to deploy fully automated disease surveillance and minimize the need for human mediation. The key merits of an automated approach include the fact that it would have a significantly lower operational cost than a human mediated approach coupled with a very high throughput. Given the interconnected nature of the modern world and the speed at which epidemics may disperse it is important to process as much data as possible and as quickly as possible. However, the need to dictate the epidemiological focus of such systems by predetermining diseases of interest means that it is not possible to accommodate new health concerns such as emerging diseases2 without significant rework. In this paper we demonstrate an ontology boosted neural architecture for generalized automated disease name extraction from Twitter messages.
Previous attempts to detect disease names in natural language text have been made on less noisy technically oriented sources such as discharge summaries, echocardiogram, radiology, and ECG reports  and specialized databases such as DistiLD  and COSMIC (Catalog of Somatic Mutations in Cancer) as described in . In natural language parlance this task is referred to as named entity recognition. For disease named entity recognition techniques such as machine learning methods like Conditional Random Fields (CRFs)  and dictionary lookup techniques  have been employed. In these cases because the text originates from predominantly technical writers semantic annotation is fairly straight forward using any of the several publicly available medical thesauruses like UML (Unified Medical language)—SNOMED-CT.3
Work on tweets is limited, previous work has employed CRFs and log linear models. For most work involving named entity recognition several features are required to obtain good performance. These features include the words themselves, word prefixes, word suffixes, part of speech tags of words, semantic category of word in reference ontology, word surface form etcetera. These approaches, with the exception of work by Magumba et al. , implicitly assume that the lexical composition of training data and target data is similar and have generally employed very small data sets. However, Twitter text is known for its large, unbounded lexicon in addition to other distortions like misspellings and made up words but even in these experiments which assume a fixed lexicon, the performance of automated named entity recognition approaches for disease name extraction significantly lags the state of the art on Twitter for instance Jimeno-Yepes et al.  report an F1 score of only 0.63 on a dataset of only 1300 tweets in which 1200 tweets are used for training and 100 for testing. In addition it is difficult to reliably obtain some of the most informative features such as part of speech tags and chunk tags on Twitter messages due to the afore-mentioned problems.
Our approach builds on previous work by Magumba and Nabende , in which an ontology is used to successfully obtain state of the art performance for classification of messages about diseases using a single model for lexically divergent data. The approach generates conceptual representations which are used to create classification models that generalize better than word based models over the held out test datasets. The training data comprises messages about the flu whereas the test data comprises messages about an arbitrary selection of other diseases. In this paper we build on the observation that neural word embeddings can be learned over these conceptual representations and that similarly to word level models they significantly improve message classification performance. In particular, the Doc2Vec  algorithm is applied which extends the word2vec algorithm  by applying similar reasoning to entire documents. In this work we investigate if similar gains can be obtained for the named entity recognition task given the combination of deep learning and neural embeddings has produced state of the art results in other domains.
As is evident the word “eye” is prominent for conjunctivitis in Fig. 1a as is the word “stomach” for gastroenteritis in Fig. 1c and with some domain knowledge we can easily infer that these symbolize the sites of infection for the respective diseases. Furthermore, these words would appear as distinct elements of information to a word level model but may actually serve an equivalent purpose for a generalized concept-level model. For instance we can create a category for anatomical references and any mentions of “eye” and “stomach” would account as mentions of this category rather than as unique data points allowing models to employ knowledge about the conceptual context of one disease to messages about another disease. Furthermore, even though we do not encounter the word “leg”, as humans we can easily predict that it too is a member of this category and hence extend it.
In addition, even though the term “interstitium” refers to a part of the body it is way too technical and specific to expect it to occur in casual Twitter posts with significant regularity and it is our opinion that we can safely ignore it and simply register it as a noun. We can therefore identify such concept categories and almost exhaustively describe them by creating lists of words that describe that category. In deriving concepts for the ontology we simply repeat this process several times until we obtain a set of concepts that we think are representative of the domain. In general we avoid technical terminology and names of specific diseases and drugs and focus on the basic linguistics of communicating illness by leveraging domain knowledge to identify those words that are likely to be in the same contexts as names of diseases.
An ontology for disease incidence detection on Twitter
Living things comprise the key biological actors such as hosts (the organism that suffers disease which in this case is a person), pathogens (disease causing organisms) and vectors (disease spreading organisms). Inanimate objects are of two major classes namely environments and substances. There are three types of concepts namely relationships and properties and actions. Relationships describe interactions between concept classes in the object hierarchy and correspond to OWL (Web Ontology Language) object properties whereas properties describe object and relationship attributes and correspond to OWL data properties. Actions describe object behavior for certain objects like people where a typical action is to exercise and can be thought of as data properties for which the range and domain are the same.
The concept dictionary is simply a list of words related to the concept. Roughly speaking the members of a given concept dictionary are related by three relationships namely synonymy, hyponymy/hypernymy and meronymy. Synonymy refers to semantically equivalent words such as “skin” and “dermis”, hypernymy is refers to increasing generalization for instance “location” is a hypernym of “house”, hyponymy is the inverse of hypernymy as in “house” would be a hyponym of “location”. The full ontology along with a description is publically available in OWL/XML format at this Github repository.4 Also the process of creating and populating concepts with concept dictionaries is largely independent of the data. At the conceptual level the ontology is intended to be a static and reusable resource.
Recurrent neural networks
The learning task attempts to learn the most suitable values for w given some training data by minimizing E via an iterative procedure with two components namely forward and back propagation. During forward propagation, the value of y is computed for each training example and the error obtained via Eq. 2. During back propagation the error is used to adjust the weight, w, as described by expression 3. The adjustment is scaled by η, the so called learning rate. This is important since in practice, neural networks do not learn the exact target function but rather the best approximation given the provided parameters and data. The error term is computed over all examples as a root mean squared error. It is critical that the learning rate is not too big as this may cause the learning routine to overshoot the optimal solution and hence never “converge” on the other hand too small a learning rate results very large training times. For modern machine learning libraries, this is handled by the backend with optimization methods such as adam.
At each time step, two outputs are computed the “history” and the current prediction. The history is given by Eq. 8 and the current prediction by Eq. 9. The weights Wy, WR, Wl and bias vector bh are shared across the layer. In this case f is referred to as the recurrence formula; it is common to use tanh activation function for f. The output layer employs softmax activations allowing the output to be interpreted as probabilities. The error is interpreted as a cross entropy error, however the total error from all time steps is considered as the loss. Similarly the gradients with respect to Wy, WR, Wl for all time steps are summed up into a single gradient since Wy, WR, Wl are shared across time steps.
Back propagation works similarly to back propagation in feed forward neural networks but is commonly referred to as back propagation through time since gradients are computed with respect to weights at previous time steps in addition to the current time step. It is also common to limit the length of the sequence through which to back propagate for the sake of memory efficiency and in this case it is referred to as truncated back propagation through time.
A key advantage of recurrent neural network is that it can use the full context of the entire sequence. For methods such as CRFs and log linear models the complexity is quadratic in the length of the sequence and length of the label set requiring most implementations to limit the context, however for recurrent neural networks this is not an issue as the entire history at time t is represented in the vector ht.
Vanishing and exploding gradients
Recurrent neural networks can result in arbitrarily long gradient computations particularly in the computation of errors with respect to WR and Wl as these are obtained by the gradient chain rule as opposed to the error with respect to Wy which depends only on values at the current time step. In situations where the result of these computations is a consecutive sequence of numbers greater than or less than 1.0 it can result in arbitrarily large or exceedingly small gradients referred to as exploding and vanishing gradients respectively. In either case the effect is to impede further learning as weight updates become too small or too large. Several solutions exist for this problem including making changes to the architecture, gradient clipping (for exploding gradients) in which minimum and maximum bounds are defined for the gradient a priori and employing special learning units referred to as gates, in particular LSTM (Long Short Term Memory) gates and GRUs (Gated Recurrent Units).
We employ the same dataset used in Magumba et al. . These tweets were obtained via basic Twitter account using a python script through Twitter’s Streaming API . The tweets are marked as public by the authors; which is the default security level, making them available to third party applications like ours. We employ simple keyword filters to extract the desired tweets. These keywords are names of diseases like flu and pink eye.
Not all messages mentioning diseases necessarily talk about them in the context of reporting active cases of disease incidence and since this is the context we are interested in, we see no point in training our model on tweets that talk about diseases in these alternative contexts. Therefore, we further examine the tweets manually to extract only those that report ongoing or recent cases of disease activity within some time window of up to a few weeks where we assume that diseases may still be in their communicable period. Using this approach we generate three datasets: An influenza + common cold + Listeria data set which comprises 73,848 tokens in 7835 tweets which we employ as the training data set, a mumps + measles + pneumonia data set which comprises 3866 tokens in 316 tweets a pink eye data set which comprises 3428 tokens in 267 tweets.
The mumps + measles + pneumonia and the pink eye data sets are used for testing. The former is used to test the models on cases where the disease name is a singular word and the latter to assess the performance where the disease name is a multi-word expression. In addition we eliminate duplicates by removing retweets, (tweets with the “RT” tag) and also manually checking for duplicates that may not be marked as “RT”. Finally we remove all punctuation except the “#” and “@” symbols where they appear at the beginning of tokens where they are used to denote hashgtags and users respectively.
For each word in our data set, we assign a label of ET (Entity True) or PERS for persons or ORG for organizations or PLACE for geo-political entities or MC (Miscellaneous category) for everything else. The Entity True labels refer to words that are part of a disease name.
We transform each tweet into a vector of features as follows. Firstly, we flatten out our ontology into a list of its constituent concepts. As stated in “General rationale” section each concept is associated with a group of words or tokens referred to as the concept dictionary. Each concept is effectively a list of words and the full ontology is basically a list of lists. In this sense it is a heavily redacted English dictionary containing only words we consider to be of epidemiological relevance. To obtain the feature vector we simply tokenize each tweet and for each token we do a dictionary look up in our flattened ontology.
To obtain the embeddings to be used as input for the neural network we use the Doc2vec algorithm over unlabeled examples of our entire corpus in the conceptual representation. From the Doc2Vec model we can extract word2vec embeddings which are able to preserve non trivial semantic relationships between words. For instance where v(King) is the vector for king it has been found that the expression v(King) − v(Man) + v(Woman) returns a vector that is roughly equivalent to v(Queen). We do not train the word2vec algorithm directly but extract out word2vec vectors from the best performing Doc2Vec model in Magumba and Nabende . We do this to demonstrate that a single model may be used for multiple purposes.
We also perform two additional experiments to demonstrate the utility of our concept level embeddings. The first experiment employs word level embeddings from a custom word2vec model trained from our corpus with a vocabulary of 59,000 words. However, we transform our corpus such that we transform any hashtags into “ht”, urls into “url” and user names into “usr” and ignore any tokens occurring less than 2 times thereby truncating our corpus to 9206 tokens. The second employs word level embeddings from Google’s word2vec model trained from a 6 billion word corpus with a vocabulary of 3 million unique words. For these additional experiments we essentially use the same architecture except the input shape is slightly different because the optimal model with the custom word2vec model is obtained with 100 dimensions and Google’s 6 billion word corpus word2vec model employs 300 dimensions.
We do not perform any parameter tuning and employ what we expect to be typical values for dropout, number of layers and number of epochs. Since the goal is to obtain a model that performs reasonably well on multiple test datasets that are lexically divergent we deliberately try to employ a simple architecture. The full vocabulary for the CNF representation is only 166 concepts.
Mumps + measles + pneumonia
CNF + log linear model
CNF + CRF
LSTM + custom Word2Vecmodel
LSTM + generic Word2Vec model
LSTM + CNF Word2Vec
The results show the precision, recall, and F measure of the different methods and representations with respect to the “ET” tag. The LSTM with concept level embeddings clearly outperforms word level embeddings in addition to previous experiments with CNF with CRF. In addition the LSTM model employs fewer features. The results on the CRF and log linear models require part of speech tags and chunk tags in addition to the CNF concepts. These themselves require their own separate pipelines which owing to the issues pointed out in “Related work” section perform unreliably with Twitter text. On the RNN only the concept embedding is required. Assuming the CNF representation can be obtained directly this is probably a more reliable approach and definitely computationally cheaper particularly at run time.
What is remarkable is how poorly word level embeddings with the LSTM perform. This is significantly worse than even the previous work by Jimeno-Yepes et al. . There are two ways to account for this. The first is that as far as we are aware there has not been any work in which there was a deliberate separation of training and test data such that test data and training data contained mentions of different diseases. The typical approach for separation of training and test data has been via a simple train/test split of the same dataset. Therefore in many cases it seems likely that at test time the models would simply be attempting to determine the correct labels for names of diseases that they had already encountered in training which is a far simpler task than if all disease mentions in the test data are previously unseen and the test data potentially has a different lexicon. This we feel is a great omission since the most persuasive argument for applying machine learning based named entity recognition in this scenario is the discovery of previously unseen ailments such as emerging diseases.
The second and more concrete reason why we observe such poor results is the fact that LSTMs are extremely powerful learners. When we couple this with the fact that we have a very large number of features relative to the number of examples particularly for the word level models means that there is a very large chance of over-fitting. For instance our training data comprises 73,848 tokens versus a vocabulary of 59,000 tokens; furthermore nearly 85% of the tokens appear only once. In such a scenario it is very difficult to create representative training data for good general models since data annotation is a manual task which is always resource constrained. Therefore, the more powerful the learner the higher the chance of over-fitting, bearing in mind that ours is a very contrived experiment we would assume it only gets worse for live data. On the other hand the CNF form has only 166 features. Assuming these features to be correct, it would be far easier to generate representative data for concepts than for a vocabulary with millions of tokens. With the CNF representation the length of the word level vocabulary is inconsequential as the procedure always compresses it into a fixed size vocabulary of just 166 concepts.
Although it is important to maximize the performance in terms of precision, recall and F1 score, since it is impossible to obtain a perfect classifier we opine that it is equally important to be able to automatically determine false positives. We posit that given the classifier does not contain any systematic bias, its probability of incorrectly assigning the “ET” tag to tokens that are not names of diseases should be uniformly spread across all non-disease names and it should also assign the “ET” tag to disease names with a significantly higher probability than words referencing other entities.
As far as we can tell the LSTM + Word2Vec + CNF is the best performing approach so far on this specific named entity recognition task in literature and unlike previous approaches we have shown that this method is generalizable to lexically divergent data sets. The typical experimental setup has relied on obtaining some examples, randomizing them to ensure a uniform lexical distribution and then splitting the data into training and test portions. In our setup we train a model on one set of messages talking about a given disease and test the model on data sets containing messages talking about other diseases. We find that the resulting model performs quite well and therefore the ontology boosted approach makes an automated general threat detection system feasible.
Finally, with the appropriate recurrent neural network architecture it is possible to combine the two tasks namely classification and named entity recognition. This would allow an end to end textual analytical pipeline in which we can simultaneously check if a message is relevant and also recover disease mentions in a single step.
Programming libraries that allow external code to make calls to a system.
Infectious diseases that have recently appeared within a population or those whose incidence or geographic range is rapidly increasing or threatens to increase in the near future.
MAM conceptualized the idea, prepared the datasets, designed and run the experiments, analyzed results, drafted and proof-read the manuscript, PN and EM provided general guidance concerning the work flow of the solution, verified datasets, proof-read and revised initial versions of the manuscript, verified the results. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Availability of data and materials
The best performing Keras HDF5 LSTM model and the datasets are available this Google drive folder: https://drive.google.com/open?id=18D4efjV4800IW311OUdDADQn7gum6_3A, Google’s 6 billion word corpus word2vec model can be downloaded here https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit.
Consent for publication
We, the authors, consent to the publication of this manuscript in the Journal of Big Data.
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the first workshop on social media analytics. SOMA’10. ACM, New York, NY, USA. 2010. p. 115–22. http://dx.doi.org/10.1145/1964858.1964874.
- Eiji A, Sachiko M, Mizuki M. Twitter catches the flu: detecting influenza epidemics using Twitter. In: Proceedings of the conference on empirical methods in natural language processing. EMNLP’11. Association for Computational Linguistics, Stroudsburg, PA, USA. 2011. p. 1568–76. http://dl.acm.org/citation.cfm?id=2145432.2145600. Accessed 8 July 2018.
- Paul MJ, Dredze M. A model for mining public health topics from Twitter. Health. 2012;11:6–16.Google Scholar
- Denecke K, Nejdl W. How valuable is medical social media data? Content analysis of the medical web. Inf Sci. 2009;179(12):1870–80.View ArticleGoogle Scholar
- Diaz-Aviles E, Stewart A. Tracking twitter for epidemic intelligence. Case study: EHEC/HUS outbreak in Germany. 2011. p. 82–5.Google Scholar
- YoussefAgha AH, Jayawardene WP, Lohrmann DK. Role of social media in early warning of norovirus outbreaks: a longitudinal Twitter-based infoveillance. In: Proceedings of the international conference on data mining (DMIN). The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp). 2013. p. 1.Google Scholar
- Polgreen PM, Chen Y, Pennock DM, Nelson FD, Weinstein RA. Using internet searches for influenza surveillance. Clin Infect Dis. 2008;47(11):1443–8. http://dx.doi.org/10.1086/59309810.1086/593098. http://arxiv.org/abs//oup/backfile/contentpublic/journal/cid/47/11/10.1086/593098/2/47–11–1443.pdf.
- Garimella VRK, Alfayad A, Weber I. Social media image analysis for public health. In: Proceedings of the 2016 CHI conference on human factors in computing systems. ACM. 2016. p. 5543–7.Google Scholar
- Zephoria.com. The top 20 valuable Facebook statistics. 2017. https://zephoria.com/top-15-valuable-facebook-statistics/. Updated Nov 2017. Accessed 22 Nov 2017.
- Newberry C. 28 Twitter statistics all marketers need to know in 2018. 2018. https://blog.hootsuite.com/twitter-statistics/. Accessed 8 July 2018.
- Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457:1012. https://doi.org/10.1038/nature07634.View ArticleGoogle Scholar
- Freifeld CC, Mandl KD, Reis BY, Brownstein JS. Healthmap: global infectious disease monitoring through automated classification and visualization of internet media reports. J Am Med Inform Assoc. 2008;15(2):150–7. https://doi.org/10.1197/jamia.M254410.1197/jamia.M2544.View ArticleGoogle Scholar
- Collier N, Doan S, Kawazoe A, Goodwin RM, Conway M, Tateno Y, Ngo Q-H, Dien D, Kawtrakul A, Takeuchi K. Biocaster: detecting public health rumors with a Web-based text mining system. Bioinformatics. 2008;24(24):2940–1.View ArticleGoogle Scholar
- 2017. http://www.promedmail.org. Accessed 28 July 2017.
- Mawudeku A, Blench M, Boily L, St John R, Andraghetti R, Ruben M. The global public health intelligence network. In: M’ikanatha NM, Lynfield R, Van Beneden CA, de Valk H, editors. Infectious disease surveillance. 2nd ed. Hoboken: Wiley; 2013. p. 457–69.View ArticleGoogle Scholar
- Ghiasvand O. Disease name extraction in clinical text using conditional random fields. M.sc. (Dissertation) University of Wisconsin-Milwaukee, USA. http://dc.uwm.edu/etd/495/. Accessed 28 July 2017.
- Pallejà A, Horn H, Eliasson S, Jensen LJ. DistiLD database: diseases and traits in linkage disequilibrium blocks. Nucleic Acid Res. 2012;40:1036–40.View ArticleGoogle Scholar
- Forbes SA, Bhamra G, Bamford S, Dawson E, Kok C, Clements J, et al. The catalogue of somatic mutations in cancer (COSMIC)., Current protocols in human geneticsHoboken: Wiley; 2008.View ArticleGoogle Scholar
- Pletscher-Frankilda S, Pallejàa A, Tsafoua K, Binder JX, Jensen J. Diseases: text mining and data integration of disease-gene associations. Methods. 2014;74:83–9.View ArticleGoogle Scholar
- Magumba MA, Nabende P, Mwebaze E. Ontology driven machine learning approach for disease name extraction from twitter messages. In: 2nd IEEE international conference on computational intelligence and applications (ICCIA). 2017. p. 68–73.Google Scholar
- Jimeno-Yepes A, MacKinlay A, Han B, Chen Q. Identifying diseases, drugs, and symptoms in Twitter. Stud Health Technol Inform. 2015;216:643–7.Google Scholar
- Magumba MA, Nabende P. An ontology for generalized disease incidence detection on twitter. In: International conference on hybrid artificial intelligence systems. Springer. 2017. p. 38–51.Google Scholar
- Le Q, Mikolov T. Distributed representations of sentences and documents. JMLR workshop and conference proceedings. Beijing, China. 2014. pp 1188–96.Google Scholar
- Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. Arizona: Google Inc. Curran Associates, Inc.; 2013.Google Scholar
- Hornik K. Approximation capabilities of multilayer feed forward networks. Neural Netw. 1991;4(2):251–7. https://doi.org/10.1016/0893-6080(91)90009-t.View ArticleGoogle Scholar
- Brightplanet.com. Twitter Firehose vs Twitter API: what’s the difference and why you should care?. 2013. https://brightplanet.com/2013/06/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/. Accessed 25 May 2017.
- Cunningham H, Maynard D, Bontcheva K, Tablan V. Gate: an architecture for development of robust HLT applications. ACL. Philadelphia, USA. 2002. p. 168–75.Google Scholar
- Taylor A, Marcus M, Santorini B. The Penn Treebank: an overview. In: Abeillé A, editor. Treebanks., Building and using parsed corporaDordrecht: Springer; 2003. p. 5–22.View ArticleGoogle Scholar