Exploring the state of the art in legal QA systems

Abdallah, Abdelrahman; Piryani, Bhawna; Jatowt, Adam

doi:10.1186/s40537-023-00802-8

Journal of Big Data

Table 4 Comparison of legal question answering methods in terms of approach, dataset, key contributions, and accuracy

From: Exploring the state of the art in legal QA systems

Method	Approach	Dataset	Key contributions	Accuracy
Kim et al. [43]	QA for legal bar exams	247 questions	Hybrid method combining simple rules and unsupervised learning using deep linguistic features	61.13%
Taniguchi et al. [83]	Legal yes/no QA system	COLIEE 2016 [45]	Case-role analysis and alignment-based approach for determining alignments	Shared first place in Phase Two, achieved third place in Phase Three
Sovrano et al. [79]	Extracting and making sense of complex information in legal documents	PIL Sovrano et al. [80]	KG extraction, taxonomy construction, legal ontology design pattern alignment, and KG question answering	Top5-recall of 34.91%
McElvain et al. [60]	Non-factoid QA for legal domain	Large corpus of question-answer pairs	Trained on machine learning algorithms, gazetteer lookup taggers, statistical taggers, word embedding models	90% Answered at 3 for correct answers, 1.5% Answered at 3 for incorrect answers
Taniguchi et al. [84]	Legal QA system using FrameNet	COLIEE 2018 [45]	Semantic database based on FrameNet and predicate-argument structure analyzer	67% average accuracy
McElvain et al. [60]	Legal QA system using pre-trained models	22M documents classified to over 120K legal topics	Use of pre-trained models and fine-tuning on legal dataset	90% Answered at 3 for correct answers, 1.5% Answered at 3 for incorrect answers
Kim et al. [48]	Legal QA using CNN	COLIEE [45]	Exploiting legal information retrieval and textual entailment using CNN	63.87%
Kim et al. [48]	Legal information retrieval and QA	Japanese civil law articles and legal bar exams	Combination of tf-idf and SVM re-ranking model, lemmatization and dependency parsing	62.14% on the dry run dataset and 55.71%, 55.79% for Phase 2 and 3 respectively
Duong et al. [21]	Vietnamese QA system	Vietnam’s legal documents	Utilization of Vietnamese resources and tools, similarity-based model, and a combination of rule-based and machine learning methods	70% precision
Kim et al. [50]	Textual entailment for legal question answering	COLIEE 2014 training data for training, and the COLIEE 2015 test data for validation	Development of a legal question answering system using Siamese CNNs, preprocessing of data by removing stop words and performing stemming, use of a three-layer CNN to extract word features and a max pooling layer, use of dropout to prevent overfitting	64.25%
Martinez-Gil et al. [59]	Analyzing co-occurrence patterns in unstructured text corpora	Legal questions randomly selected from books from the Oxford University Press	A new method for the automatic answer of multiple choice questions in the legal domain, ability to reduce workload for professionals in the legal sector, ability to be extrapolated to other specific domains	65%
Hoshino et al. [37]	Predicate argument Structure analysis	COLIEE 2018 [45]	Created legal term dictionary, synonym dictionary for predicates, person estimation feature, four types of question answering modules	70%
Alotaibi et al. [6]	Combination of retrieval-based and generative-based techniques with incorporation of prior knowledge sources such as previous questions, question categories, and Islamic Jurisprudential reference books	Custom dataset	Reduced workload on human experts by providing relevant and high-quality answers to aid in Muslims’ daily life decisions.	0.60 precision, 0.40 recall, 0.48 F1 and 0.037 for METEOR
Hoppe et al. [36]	Intelligent legal advisor	German legal documents	Semantic document retrieval and QA using state-of-the-art technologies in NLP, semantic search, and knowledge engineering	0.84 Recall 0.73 MAP

Back to article page