EXABSUM: a new text summarization approach for generating extractive and abstractive summaries

Due to the exponential growth of online information, the ability to efficiently extract the most informative content and target specific information without extensive reading is becoming increasingly valuable to readers. In this paper, we present 'EXABSUM,' a novel approach to Automatic Text Summarization (ATS), capable of generating the two primary types of summaries: extractive and abstractive. We propose two distinct approaches: (1) an extractive technique (EXABSUMExtractive), which integrates statistical and semantic scoring methods to select and extract relevant, non-repetitive sentences from a text unit, and (2) an abstractive technique (EXABSUMAbstractive), which employs a word graph approach (including compression and fusion stages) and re-ranking based on keyphrases to generate abstractive summaries using the source document as an input. In the evaluation conducted on multi-domain benchmarks, EXABSUM outperformed extractive summarization methods and demonstrated competitiveness against abstractive baselines.


Introduction
The accessibility of the ever-expanding volume of online information by humans would be impeded without the presence of summaries.Given the extensive nature of textual content, pertinent information can inadvertently evade readers' attention.Consequently, the condensation of critical information into summaries holds significant value.Since the 1950s, researchers have diligently endeavored to enhance text summarization algorithms, with the aim of achieving a level of summarization comparable to human capabilities.Text summarization remains a formidable yet promising challenge within the domain of NLP.
In text summarization, two pivotal inquiries arise: (i) the process of identifying pertinent content within a document, and (ii) the art of succinctly conveying the selected material while minimizing redundancy [1][2][3].The landscape of ATS approaches can be categorized into three primary categories: extractive, abstractive, and presently, emphasis is gravitating toward hybrid summarization-a fusion of extractive and abstractive techniques [4][5][6].
Despite the notable advancements in information technology, the domain of summarization remains an area necessitating substantial advancements.Within the realm of text summarization, several critical challenges persist, which can be encapsulated as follows: • Initially, the challenge of Text Relevancy Detection emerges.Conventional methods assume that a word's significance within a text correlates with its frequency of occurrence, with each word representing a distinct concept.However, quantifying concept occurrences poses complexity due to the presence of synonymy and coreferential expressions that contribute to text cohesion.The information flow within a document exhibits fluctuations, indicating that specific segments hold greater importance than others.Consequently, the task of effectively discerning the most pertinent details and statically and semantically distinguishing relevant terms from source documents proves to be a pervasive challenge (e.g., selection predicated on pertinent keywords or keyphrases).• Subsequently, the lack issue of coherence and redundancy.Extractive summarization faces hurdles of cohesion and coherence in the summaries produced, stemming from redundancy (phrases with comparable meaning), disjointed sentence connections, and unresolved co-reference relationships.• The third challenge pertains to abstractive and hybrid summarization.The demand for abstractive or hybrid Automatic Text Summarization (ATS) techniques becomes apparent.This genre of technique remains an evolving and intricate domain.Crafting an efficacious abstractive summary has proven challenging thus far.It is imperative to cultivate overarching guidelines and viable strategies to transition from extractive to abstract summaries, thereby harnessing the advantages offered by both ATS approaches.
In this paper, we introduce EXABSUM, an ATS SYSTEM equipped to generate two distinct summary categories.Firstly, extracts (EXABSUM Extractive ) are shaped through a strictly extractive methodology, while abstracts (EXABSUM Abstractive ) are crafted via an abstractive approach.The outlined approach effectively addresses limitations intrinsic to both extractive and abstractive summarization techniques.Consequently, our contributions to state-of-the-art systems encompass the following: • Diverging from certain extant extractive systems reliant solely on statistical scoring mechanisms for verbatim phrase extraction from the source document, our approach introduces a distinctive unsupervised extraction strategy aimed at tackling the challenge of Text Relevancy Detection.This innovative method combines the strengths of both statistical and semantic scoring techniques to discern crucial information, while concurrently proposing a novel one.• Unlike certain extant extractive systems, our approach introduces the element of Semantic redundancy mitigation-a pivotal concern within ATS.To circumvent the inclusion of semantically and contextually redundant information in final summaries, we advocate the adoption of textual entailment.This approach serves to mitigate the readability challenges inherent in existing methods, thereby alleviating a drawback commonly associated with the produced text.• We confront the challenge of generating abstractive summaries by presenting a graph-based summarization model designed to yield resilient abstractive summaries.This model builds upon and extends a pioneering multi-sentence compression and fusion approach, bolstered by a re-ranking method based on key-extraction.Notably, this approach functions independently of any need for training data or acquiring knowledge of the document's structure or domain.
The paper's structure is delineated as follows.The subsequent section introduces pertinent related works and outlines ATS systems developed to cater to distinct applications.Sect."EXABSUM ATS Approach" delves into the description of our proposed ATS system, EXABSUM.Within this section, we expound upon its primary stages, recommended architecture, and the two methodologies employed for the creation of extractive and abstractive summaries.In Sect."Experimental setup", we detail the experimental framework.Here, we provide insight into the datasets utilized, elucidate the conducted experiments aimed at parameter tuning, and subsequently discuss the evaluation process.The achieved results, compared to the other state-of-the-art systems, are presented in the final part of the section.Finally, Sect."Summary and conclusions" discusses the conclusion and future work.

Related works
The initial efforts in the domain of automatic summarization focused on extractive approaches, which aim to select pertinent existing words, phrases, or sentences directly from a source text to capture its most pivotal content.Extractive Automatic Text Summarization (ATS) approaches are typically carried out in three steps [5]: (1) Construct an intermediate representation of the original text (usually involving preprocessing and segmenting the text into paragraphs, phrases, and tokens); (2) Sentence scoring (the score should measure the importance of a sentence to the comprehensive understanding of the text) by attributing scores to the most relevant words, followed by an assessment of sentence characteristics such as position within the document, sentence length, title alignment, and other factors.Previous research of extractive summarization has predominantly focused on (1) sentence-clustering-based, (2) statistical, (3) graph-based, and (4) optimization-based techniques.In the context of the first approach, the document comprises n sentences, each sharing an identical set of terms.Consequently, the set of terms in the document corresponds to the set of terms in each phrase.The distance between corresponding sentences can be employed to illustrate the similarity in language patterns [7][8][9][10].
Sentence-clustering algorithms organize related textual units (paragraphs, sentences) into multiple clusters to uncover common themes of information, subsequently selecting text units from these clusters in the final summary.One of the noteworthy extractive summarization techniques is the centroid-based method [11].An instance of an Automatic Text Summarization (ATS) system employing sentence-clustering algorithms is the MEAD system [12], a bilingual (English and Chinese) summarizer system that provides extractive single and multi-document generic or query-focused summaries.The MEAD system computes centroid topic characterizations for individual documents or provided clusters, leveraging tf-idf-type data.It evaluates candidate summary sentences by weighing sentence scores against the centroid, text position value, and tf-idf title/lead overlap.A summary length threshold governs sentence selection, while cosine similarity analysis against prior phrases curbs redundant new phrases.
Incorporating a summarization technique within a comprehensive retrieval and grouping process, the QCS system [13] generates a single extractive summary for each cluster.This is achieved through a method that combines sentence "trimming" and a hidden Markov model, followed by pivoting QR decomposition.The model identifies sentences with the highest likelihood for inclusion in the summary.
Statistical approaches [14] rely on elementary metrics like TF-IDF scores and word co-occurrence [1,15,16].Ko and Seo [17] introduced a proficient methodology for text summarization that harnesses contextual insights and statistical methodologies to extract pertinent sentences.
Graph-based approaches [7] depict text as a network of phrases and devise summaries through graph-based scoring mechanisms.An innovative and versatile summarizer, GRAPHSUM, rooted in a graph model, was proposed by Baralis et al. [18].It captures interrelationships among various elements by uncovering association rules.Parveen and Strube [19] presented an extractive graph-based unsupervised technique for summarizing individual documents that accounts for three critical summary attributes: significance, non-redundancy, and local coherence.Optimization-based methods [20] employ optimization techniques such as integer linear programming [21], constraint optimization [22], and sparse optimization [23].
Other ATS systems, like SummGraph [24], employ graph-based algorithms and knowledge databases to discern the substance of pertinent texts.Notably, this specific system has demonstrated efficacy across domains encompassing news, biomedical research, and tourism.Summaries have also embraced the incorporation of Natural Language Generation (NLG) to introduce fresh terminologies and linguistic structures.Belz [25] presents a text summarization technique grounded in 'NLG' to automatically generate weather forecast reports.Mohammad et al. [26] elucidated a system for the automated creation of technical surveys rooted in citations.More recently, Erera et al. [27] introduced the IBM Science Summarizer, an innovative methodology catering to Computer Science papers.This approach crafts summaries contingent upon user-provided information requisites, be it a natural language inquiry, scientific tasks (e.g., "Machine Translation"), datasets, or scholarly venues.
Although extractive methods can adeptly identify significant information, they may lack the fluidity and precision inherent in human-generated summaries.Consequently, abstractive ATS approaches strive to enhance sentence coherence by diminishing redundancies, elucidating sentence context, and potentially introducing supplementary phrases into the summary.For the synthesis of the final summary, abstractive techniques generally leverage sentence compression, fusion, or modification mechanisms.Barzilay and McKeown [28] pioneered a system wherein dependency trees represent input phrases, and select words are aligned to integrate these trees into a lattice structure.The lattice is subsequently linearized via tree traversal to generate fusion sentences.[29] introduced an innovative approach to sentence fusion, framing the fusion task as an optimization problem.This unsupervised technique draws on dependency structure alignment, semantic and syntactically informed phrase aggregation, and pruning strategies.Later, Filippova delved into the challenge of condensing a collection of interconnected sentences into a succinct single sentence, termed as multisentence compression, and presented a foundational technique based on shortest paths in word graphs [30].Her method yielded grammatically sound and informative summaries, subsequently finding application in diverse contemporary summary systems [4,31].Boudin [32] extended Filippova's approach by addressing Multi-Sentence Compression (MSC) as the task of generating a concise single-sentence summary from a cluster of interconnected sentences.He introduced an N-best reranking algorithm based on the frequency and relevance of keyphrases within the documents, resulting in more informative summaries.Banerjee et al. [33] devised multi-document abstractive summaries using word graphs and Integer Linear Programming (ILP).They clustered akin sentences among pivotal documents and employed word-graphs to identify shortest paths.The ILP model facilitated the identification of sentences with maximal information and readability, effectively reducing redundancy.Nayeem et al. [34] formulated an unsupervised abstractive summarization system.Their innovation was a paraphrastic sentence fusion model amalgamating sentence fusion with paraphrasing at the sentence level through a skip-gram word embedding model.This model augmented information coverage and heightened the abstract nature of the generated phrases.Shang et al. [35] introduced a fully unsupervised graph-based architecture tailored for abstractive summarization of meeting speeches.Their unified framework amalgamated the strengths of six prevailing approaches across three distinct tasks (keyword extraction, multi-sentence compression, and summarization), effectively addressing their respective limitations.Their abstractive summarization approach underwent four key processes: preprocessing, community recognition, multi-sentence compression, and submodular maximization.

Filippova and Strube
Recently, the NLP research community has increasingly directed its attention towards Hybrid ATS techniques.In hybrid approaches, extractive methods are harnessed to identify content terms and sentences deemed essential for inclusion in the summary, while simultaneously guiding the development of abstracts [36].Such methods amalgamate the strengths of both extractive and abstractive ATS techniques.Di Fabbrizio et al. [37] introduced a hybrid approach that crafts summaries for product and service reviews by blending natural language generation with salient sentence selection techniques.Their 'STARLET-H' system operates as a hybrid abstractive/extractive summarizer.It employs extractive summarization techniques to identify significant quotes from input reviews, incorporating them into an automatically generated abstractive summary to provide validation, disclosure, or justification for favorable and/or negative viewpoints.However, the algorithm necessitates a substantial amount of training data to comprehend aspect order.LLORET and ROM-FERRI [38] proposed the COM-PENDIUM ATS system for generating research publication abstracts in the biomedical domain.This system produces two distinct types of generic summaries: extractive and abstractive-oriented, accompanied by their respective COMPENDIUM variants: COM-PENDIUM-E and COMPENDIUM-A.The extractive approach selectively picks and extracts the most pertinent sentences, while the abstractive-oriented approach blends extractive and abstractive techniques, incorporating an information compression and fusion stage.Bhat et al. introduced "SumItUp," a single-document hybrid TS system, in [39].The hybrid system consists of two phases: (1) Extractive Sentence Selection, which generates the summary using statistical features (sentence length, sentence position, TF-IDF, noun phrases, verb phrases, proper nouns, aggregate cosine similarity, and cue phrases), along with a semantic feature (emotion described in the text).In the extractive summary, cosine similarity is utilized to eliminate redundant sentences.For abstractive summary generation, the extracted sentences undergo processing by a language generator (a fusion of Wordnet, part-of-speech tagger, and Lesk algorithm) to transform the extractive summary into an abstractive rendition.

System's architecture
In this subsection, we explain the two approaches introduced by the EXABSUM ATS system for generating the two types of summaries.It is pertinent to highlight that our proposed ATS architecture comprises two distinct components.The first component, denoted as EXABSUM Extractive , represents a purely extractive ATS approach (Sect."EXABSUMExtractive core stages"), while the second component, EXABSUM Abstractive , encompasses abstractive techniques to yield an abstractive summary (Sect."EXABSUMAbstractive core stages").

EXABSUM Extractive core stages
The preliminary phase of our methodology is centered on extractive summarization.A conventional approach to extractive summarization treats sentences as individual entities, extracting the most pertinent ones from the text based on specific characteristic features (which gauge the suitability of a sentence for inclusion in the summary).Subsequently, the top N extracted sentences are organized to create the summary.The extraction procedure is compartmentalized into four stages (illustrated in Fig. 1).
The following core stages are covered in detail:

Text pre-processing
First, we initiate the process by conducting fundamental linguistic analysis to prime the text for subsequent stages of processing.This involves the application of text pre-processing (TP) to standardize input files and establish clear sentence boundaries within word sequences.TP encompasses two primary categories: noise removal and normalization.Noise refers to data components that contribute redundancy to the primary text analytics.The manner in which this foundational phase is executed can significantly influence the accuracy of the sentence selection technique.Thus, it is imperative to provide explicit details regarding our implementation approach.Depending on the dataset type, each document undergoes the subsequent pre-processing stages: • Sentence splitting or segmentation: As an initial step routinely conducted on texts prior to subsequent processing, this involves the process of dividing the input text into individual sentences.This division is undertaken to extract pertinent information from the text • Tokenization: Each sentence undergoes intelligent tokenization, wherein all marks, punctuation, brackets, digits, and special characters are removed, and all words are converted to lowercase.For instance, given the sentence: "(text summurizagtnst Bion;,;:,appR;aochAs is; NL = P a*nd I2r s)", the result would be: "( Text summarization approach is NLP and IR)".This process allows for the identification of individual words within the document, facilitating subsequent tasks such as calculating word co-occurrences and distinguishing between stop words and nonstop words.• Part-of-speech tagging: Each word is assigned a morphological category using a partof-speech tagger (such as noun, verb, adjective, preposition, adverb, determiner, pronoun, and conjunction).This process proves advantageous for discerning between various types of words, as certain categories (e.g., nouns or verbs) hold greater significance than others (e.g., determiners).This tool's application will be evident in subsequent data compression and fusion phases.Notably, the Stanford POS tagger was utilized for this part-of-speech tagging process.• Lemmatization: Variations in a term can impact its frequency.Lemmatization involves reducing a word's inflectional forms and derivationally related forms to a standardized base form, referred to as its lemma.Unlike stemming, lemmatization relies on the precise identification of a word's intended part of speech and meaning within a phrase and in the broader context of surrounding sentences or even an entire document.To achieve this, we utilize the Stanford Core NLP package [40] to lemmatize our statements.• Stop Word Identification: Certain stop words contribute to the reduction of feature space, resulting in decreased time and space complexity.Stop words encompass various prepositions, pronouns, and conjunctions commonly found in sentences.The removal of these terms prior to text analysis ensures that the prevalent words primarily pertain to the context rather than being commonplace throughout the text.
In our process, this step is conducted before computing single keyword relevance, as stop words are excluded from consideration in subsequent phases.

Redundancy detection and removal
Redundancy is regarded as an undesirable attribute that affects the quality of summaries.In fact, the identified redundant sentences need to be removed from the texts, preserving only a single collection of non-repetitive sentences to be used as input for the summarization process.Our objective at this point is to identify semantically identical content within the source documents and exclude it from the summary.Textual Entailment (TE) is employed for this precise purpose [41].
The objective of TE is to determine whether the meaning of a text sample, referred to as a hypothesis (H), can be inferred from another text, known as the text (T) [41].Textual Entailment (TE) involves predicting whether the information presented in the first sentence unquestionably implies the information in the second sentence for a pair of sentences.It addresses semantic inference as a direct mapping between linguistic expressions and abstracts the typical semantic inferences required for text-oriented NLP applications.TE has found successful application to the general summarization problem [42][43][44], and specifically for identifying duplicate information while addressing summarization [45].The entailment relationships are computed using the TE method described in [46].The TE tool relies on lexical (cosine similarity, Levenshtein distance), syntactic (dependency trees), and semantic measures based on WordNet [10].
After eliminating the redundant sentences from the source texts, the non-repetitive sentences that remain will be input into the extractive summarization approach.This approach employs a range of scoring techniques to identify pertinent content, encompassing both statistical and semantic aspects.

Sentence relevance
The significance of a sentence in relation to the overall comprehension of the text should be employed to ascertain its importance.This involves assigning scores to the most pertinent terms and subsequently assessing and computing sentence attributes such as document position, sentence length, and title similarity.These features can be integrated to assess the remaining sentences and select those with the highest scores for inclusion in the summary [47][48][49][50][51].
Sentence salience scoring techniques (or combinations thereof ) are employed to assign a score to each sentence based on its significance.In this work, we introduce a hybrid model based on extraction, which integrates statistical, structural, and semantic features.The subsequent subsections offer a concise overview of the methods utilized in this phase:

a. Term Relevance-Inverse sentence frequency (TR-ISF)
We introduced a novel metric named TR-ISF, derived from the conventional Information Retrieval IR technique Term Frequency-Inverse Document Frequency (TF-IDF).This modified version of TF-IDF is tailored for sentence-level text summarization, as opposed to the document-level summary for which TF-IDF is traditionally used.In this approach, the relevancy TR of term t is established through its statistical and semantic relationship across the entire document-dataset level.Subsequently, the ISF gauges the descriptiveness of a word, assessing its prevalence or rarity across all sentences.This methodology operates under the assumption that if a term is both relevant and present in a limited number of sentences, it is likely to be included in the summary.In essence, pertinent keywords can be employed to detect or quantify sentence relevance, as well as to pinpoint the most relevant topic or topics within a text.
Initially, we employ a Hybrid Feature Selection Model (HFSM) to compute the term relevance using the 'TR' metric.This model integrates both statistical and semantic features.Subsequently, the TR-ISF Equation (Eq.( 12)) is employed to ascertain the ultimate synthetic score for each term, which is subsequently leveraged to compute the sentence's salience score (Eq.( 13)).It's important to note that not all terms are taken into account, and to ensure accuracy, stop word filtering and stemming are applied prior to evaluating a term's relevance.
The chi-square statistic permits the testing of statistical independence between a term and a category by contrasting the observed frequency with the expected frequency, calculated under the assumption of their independence.The χ 2 value is defined as: where O i, j represents the observed frequency and E i, j denotes the count of docu- ments that fall under category c and also contain the term w .To discern the nature of the dependency when present, Li et al. [52] introduced a novel measure called term category dependency, defined as: where R w,c is the ratio between O(w, c) and E(w, c) .R w,c should be close to 1 if there is no dependency between the term w and the category c(i.e., χ 2 w,c is not statistically significant), R w,c should be larger than 1 if there is a positive dependency, meaning the observed frequency is greater than the expected frequency.Conversely, R w,c should be smaller than 1 if there is a negative dependency.
In order to calculate the feature significance of the word w within a corpus contain- ing k categories, Li et al. [52] combine Eqs.(1) and ( 2) which results in a novel measure known as CHIR, defined as follows: where p(R w,c j ) is the weight of chi-square statistic χ 2 w,c j in the corpus in terms of R w,c j .It is defined as: This new term-goodness measure, rχ 2 (w) , is the weighted sum of χ 2 w,c j statistics when there is a positive dependency between the term χ 2 w,c j and the category c j , a bigger rχ 2 (w) or CHIR measure value indicates that the term is more relevant.
We utilized the Mutual Information (SIM) measure, a commonly employed concept in information theory, to enhance the semantic aspect of the chosen words within (1) a specific context.This measure quantifies the significance of words based on their semantic content and serves as a gauge of their value.SIM was introduced as a means of gauging word association, indicating the intensity of the connection between words by contrasting their actual probability of co-occurrence with the probability anticipated by chance.Mutual Information indicates the proportionate shift in the likelihood of encountering x when y is present (the amount of information that y provides about x ) [8].It is based on the fact that two words are considered similar if their mutual information with all the words in the vocabulary V is nearly the same [8].The semantic similarity measure between two terms w 1 and w 2 is defined as follows: where V is the vocabulary and I(z i , w 1 ) is the mutual information between the term z i and w 1 .I(z i , w 1 ) is evaluated using the following formula: where d represent the size of a sliding window,P d (z i , w 1 ) is the probability of succession of z i and w 1 in a window of (d + 1) words and P(z i ) is the priori probability of the term z i .This probability can be estimated by the ratio of the number of times that z i is fol- lowed by w within the window and by the cardinal of the vocabulary.
The similarity between a term w and a document centroïd d is defined in [53] as the average of the similarities between the word w and the x words of the document cen- troïd.This measure is given by: so as to determine the semantic relevance of a term w in a corpus of k clusters, for each cluster we calculate the weighted sum of its similarities with the document centroid dcen j of each cluster c j using the following formula: where P I w, dcen j is the weight of the similarity between the term w and the doc- ument centroïd dcen j and I w, dcen j is the mutual information between w and dcen j .Considering the contingency table of a term w and a centroïd d where A is the number of times w and d co-occur i.e.,w occur in documents that belong to the cluster whose centroid is d , B is the number of times w occurs without d, C is the number of times d occurs without w and N is the total number of documents. (5 The mutual information criterion between a term w and a document dcen j is defined by: If there is a strong association between w and dcen j then the joint probabil- ity P(w, dcen j ) will be larger than P(w)P(dcen j ) ; consequently I w, dcen j > 0 .If w and dcen j are in complementary distribution, then P(w, dcen j ) will be less than P(w)P(dcen j ) hence I(w, dcen j ) < 0 .In the case of poor association between w and dcen j , then P(w, dcen j ) ≈ P(w) P ( dcen j ), consequently I(w, dcen j ) ≈ 0. The weight of P I w, dcen j defined as: A term with a high weight in the SIM(w) metric implies that it is semantically relevant.
We define the feature goodness of a term as a combination of its statistical measure chir(w) , and its semantic measuresim(w) .The overall measure of a term's rele- vance,TR(w) , is defined as follows: where α is a weighting parameter between 0 and 1.
To select the most p pertinent terms, three steps are followed: (1) calculate the hybrid measure TR(w) for each term in the document and the dataset, (2) sort the term in descending order of their criterion function, and (3) finally select the top p terms from the sorted list.A threshold δ is set to 0.25 to filter terms with a low TR(w) value.In other words, the higher the relevancy of a word, the more important it is in indicating the main topic of a document.
Hence, the T R − ISF of a word is computed as shown in Eq. ( 12) and the salience score of a sentence is calculated as presented in Eq. (13).where • TR returns the relevancy of a term(word) w i in the document(s), • T is the total of terms (words) in s i , • S w i is the total of sentences in which a relevant word w i is presented (calculated by Eq. 11), • S is the total of sentences in the document.

Sentence resemblance to the title
The title of a document often captures the main subjects discussed within it, particularly in news articles and scientific publications.The "sentence resemblance to the title" methodology assesses the similarity between sentences in a document and its title.By employing this technique, we deduce that sentences exhibiting greater similarity to the title signify the primary topic addressed in the document.This feature is computed as illustrated in the following Equation: where, • w s i is the set of the relevant words in s i • w t is the set of words in the title, • |w t | is the total of words in the title.

c. Sentence length
The consideration of sentence length aims to avoid selecting sentences that might be too short to convey the document's key points, as well as sentences that are excessively long and may result in wasted space.Acknowledging the possibility that a sentence could contain essential information in one part and unrelated information in another, this method takes into account the sentence's word count as a measure of its length.
This approach is utilized to discourage the selection of sentences that are either excessively short or excessively long, as they are not deemed optimal.Initially, sentences that fall below a specific size threshold (sentences with fewer than ten nonstop words) or exceed a certain length (sentences containing more than 50 non-stop words) are filtered out before computing the sentence score.Subsequently, the remaining sentences are assigned scores as depicted in Eq. (15).
In practice, the penalty score is determined by a conditional: where, • L i is the length of sentence i and • C is a certain length defined by user.(14) SenRT

Sentence position
The sentence position heuristic is among the most effective strategies for selecting relevant sentences in automatic text summarization (ATS).This heuristic operates on the assumption that the introductory sentences within a document hold the most crucial information.As the document unfolds, the significance of sentences tends to diminish.In our approach, we prioritize sentences that are located closer to the beginning of a document.
The score for this feature is calculated using the following formula: where • i is the i th sentence in the document, with i starting by zero, • S is the total of sentences in the document.

Summary generation
Once the scores for each sentence have been computed, the objective of this stage is to create a summary by arranging sentences based on their relevance scores.The highest-scoring sentences are selected and extracted in the order they appear in the original document, resulting in a meaningful extractive summary.To determine the overall significance of a sentence, we employed the averaged combination approach, which is considered the most effective combination method and often leads to substantial improvements [48,54].The salience score of a sentence is determined by averaging the individual scores obtained through the N considered scoring procedures in this combination.

EXABSUM Abstractive core stages
This stage aims to create an abstractive summary through the generation of new text that captures the core content or conceptual elements of the original text.This summary succinctly and coherently communicates the primary information within the document.
For this purpose, we employ a graph-based approach to construct a comprehensive ( 17) Fig. 2 EXABSUM Abstractive stages for generating abstractive summary abstractive summary, followed by a re-ranking stage that relies on keyphrases.The following steps (Fig. 2) outline the main procedures involved in this stage:

Word graph generation and re-ranking
To generate a summary containing novel sentences, this stage involves compressing and merging sentences, followed by a re-ranking process based on the quantity and relevance of keyphrases present.This approach has demonstrated its efficacy in producing more informative summaries [30,32].
A weighted directed word graph is constructed using a document (represented as a directed weighted graph) as input.Nodes in the graph correspond to words, and edges signify adjacency relationships between pairs of words.Each edge's weight is determined by the reciprocal frequency of co-occurrence of the two words.
Once the document is transformed into a word graph, a set of new sentences is generated by identifying the shortest path between nodes.This begins with the first word of each phrase in the extracted document, spanning its entire content.The following details the methodology: Let G = (V,E) be a directed graph with vertices (nodes) V and directed edges E, where E is a subset of V*V.Given a set of related sentences S = (s 1 , s 2 , . . ., s n ) , a word graph is constructed by iteratively adding sentences to it.
Figure 3 illustrates the word graph built from the four provided sentences.Edge weights have been omitted for clarity, and italicized sentence fragments are represented by dots.In the first step, the graph represents a single sentence (a sequence of word nodes without punctuation) along with the start and end symbols (depicted as start and end symbols in Fig. 3).For each word in the sentence, a corresponding node is added to the graph, and directed edges connect words that are adjacent in the sentence.If two words in subsequent sentences share the same lowercase form, they are linked to an existing node in the graph, provided that no word from the current sentence has been associated with that node before.Incorporating part-of-speech (POS) information reduces the likelihood of combining verbs with nouns (e.g., "visit"), thus preventing the generation of ungrammatical sequences.In cases where no suitable candidates exist in the graph, a new node is generated.
The process of word mapping and creation (adding words to the graph) is carried out in three distinct steps during the second stage: 1. non-stop words for which no candidate exists in the graph or for which an unambiguous mapping is possible; 2. non-stop words for which there are either several possible candidates in the graph, or which occur more than once in the sentence 3. stop words For the last two groups of words where the mapping is ambiguous (i.e., there are two or more nodes in the graph that refer to the same word / POS tuple), the immediate context (the preceding and following words in the sentence and the adjacent nodes in the graph) is examined.As a result, the candidate that exhibits a greater overlap in context is selected.Alternatively, the candidate node with the highest frequency (i.e., the node with the most words mapped to it) is chosen.In Fig. 3, for example, when sentence (3) is to be inserted, there are two potential candidate nodes for "last".Stop words are only linked if they overlap with their non-stop word neighbors.If this condition is not met, a new node is created.We utilize the NLTK stop word list, supplemented with temporal nouns (e.g., Thursday, today).Filippova's method prohibits the inclusion of punctuation marks.Boudin and Morin [32] introduced a fourth step for constructing well-punctuated compressions, involving the addition of punctuation marks to the graph.When ambiguity arises in mapping, the candidate with the same immediate context is preferred.Words contiguous in the sentence are connected with directed edges once the sentence's words are added to the graph.
The weighting function is defined in Eq. 18 to compute edge weights and determine the optimal path, representing the most effective compression for the input sentences.(18) w i, j = cohesion(i, j) freq(i) × freq(j) (19) cohesion i, j = freq(i) + freq j s∈S d(s, i, j) −1  where freq(i) is the number of words mapped to the node i .The function d(s, i, j) refers to the distance between the offset positions of words i and j in sentence s.This function has two objectives: (1) to achieve grammatical compression, it prioritizes connections between words that frequently appear in a particular order (refer to Eq. 19).( 2) to generate an informative compression, it promotes paths passing through salient nodes.
The weighting function utilized in the K-shortest path algorithm serves to identify the shortest paths within the graph from the starting point to the endpoint (Eq.20).Paths with a length of less than eight words or those lacking a verb are filtered out.The remaining paths are subsequently re-evaluated by normalizing the cumulative weight of the path over its length.Consequently, the path with the lowest average edge weight is considered the optimal compression.In our scenario, the initial node corresponds to the first word of each sentence during the generation of new sentences.This ensures that every sentence in the source text yields at least one derived sentence, guaranteeing comprehensive coverage of the document's content.

Paths filtering
Following the compilation of sentences through the shortest pathways, it's possible that certain sentences are nonsensical, improperly constructed, or incomplete.Therefore, a filtering stage is imperative to discard inappropriate pathways and uphold the integrity and coherence of the statements.To achieve this, we establish rules that necessitate sentences to satisfy all of the defined criteria; those that fail to do so are disregarded.
These rules are defined as follows: -Every sentence must contain a verb.
-A sentence must be at least three words long.
Upon the removal of erroneous sentences, the replacement sentences can seamlessly substitute the original ones.

Re-Ranking candidate sentences using keyphrases
Despite the apparent effectiveness of Filippova's method, a notable drawback is the absence of substantial information in a range of 48 to 60% of the generated sentences [30].This limitation arises because node salience is solely determined by the frequency measure.In response to this concern, we proposed a re-ranking approach that re-evaluates the N-best list of compressions by considering both the quantity and significance of keyphrases present within them.A truly informative and pertinent sentence is expected to incorporate the most relevant keyphrases [55].
Hence, we integrated a re-ranking stage that prioritizes compressions featuring the most pertinent keyphrases derived from the initial set of input sentences.This additional step involves re-evaluating the N-best multi-sentence compression candidates generated through the word graph-based method, considering the quantity and importance of keyphrases encompassed within each candidate compression.
We opted for the shortest path approach followed by a re-ranking step due to three main reasons: 1. Retaining Salient Terms: the shortest path method allows us to compress sentences while retaining important terms from the original input.It also facilitates grouping words that frequently appear together in many sentences.2. Inclusion of Content: by fusing multiple sentences, we can incorporate more content into the summary, enhancing its comprehensiveness.3. Improved Informativeness: the re-ranking stage further enhances the summary by maximizing the diversity of covered topics and producing informative and grammatically accurate sentences.The utilization of keyphrases aids in crafting sentences that effectively capture the core ideas across a set of interconnected statements.
The unsupervised technique by Wan and Xiao [56] involves extracting significant words from interconnected sentence groups.This approach is built on the concept that a word's importance can suggest the presence of other words that often occur together.The strength of this suggestion is recursively determined based on the significance of the suggesting word.
To initiate the process of keyphrase extraction, a weighted graph is constructed from the connected sentences.In this graph, nodes represent words, identified as word and POS tuples.When two words co-occur in a sentence, corresponding nodes are connected by edges, with edge weights denoting the frequency of their co-occurrence.The TextRank algorithm [57], a graph-based ranking method that incorporates edge weights, is employed to compute the salience score for each node.The score for a node V i is initialized with a default value and is iteratively calculated until it con- verges using the following Equation where adj(V i ) represents the neighbors of V i and d is the damping factor set to 0.85.
The second phase involves generating and evaluating potential keywords.We merge sequences of adjacent words that adhere to a given syntactic pattern to create multi-word phrases.In our case, we defined noun phrases based on our POS tag definitions, satisfying the regular expression rule: Unlike other definitions, our noun phrase structure includes adverbial nouns (tag RB) like "double experience" (RB NN) and present participle verbs (tag VBG) such as "virtual desktop conferencing" (JJ NN VBG), with the VBG tag appearing at various positions within the noun phrase.Adverbial nouns, also known as adverbial objectives, occupy the position that a verb's direct object typically occupies and modify the verb by providing information about time, distance, weight, age, or monetary value.Adverbs can interact with noun phrases, impacting the context and meaning of a candidate keyphrase.This (20) interaction is particularly notable in scientific contexts, where authors are precise in explaining specific situations.The score of a candidate keyphrase k is calculated by summing the salience scores of the words it contains, normalized by its length + 1 to favor longer n-grams (as shown in Eq. 21).
The generated keyphrases are grouped into clusters based on word overlap.In each cluster, the keyphrase with the highest score is selected.This filtering process produces a smaller subset of keyphrases that better represent the content of the cluster.However, the limited scope of the N-best list can hinder the effectiveness of re-ranking techniques, as they may discard many potentially suitable candidates.To address this, various other paths are considered.These paths are re-ranked by normalizing the overall weight of the path (as defined in Eq. 18) across its length and then multiplying it by the sum of the key scores it contains.The score for sentence compression c is determined as follows:

Abstractive summary generation
The objective of this concluding stage is to create an abstractive summary based on the input document.Once the preceding processes have been carried out, the remaining sentences are employed to generate abstractive summaries.Through these stages, an abstractive summary is produced, composed of properly structured and complete sentences extracted using the shortest paths.Among these, the top N relevant sentences are selected, considering their high number of keyphrases.Consequently, the resulting summaries encompass abstractive content.These summaries are categorized as abstracts, as they do not replicate the exact sentences found in the source document.

Experimental setup
A comprehensive evaluation of EXABSUM's performance has been conducted using diverse corpora spanning a wide array of topics.In this section, we will outline the following aspects: (i) the datasets employed and the methodologies used to assess the experiments; (ii) the experimentation process involving parameter refinement.Ultimately, we will compare our results with those of other analogous works.

Datasets
It is common practice to assess an algorithm by conducting experiments on a specific corpus of text summarization tasks, which encompasses both the source texts and manually generated summaries.In our case, we employed several datasets from diverse domains as our corpora.By encompassing domains like newswire, tourism, Web 2.0, science, business, health, justice, lifestyle, opinion, politics, entertainment, sports, (21) technology, and travel, EXABSUM's evaluation is carried out from a comprehensive perspective.
EXABSUM's evaluation focuses on the following datasets: - The CNN Corpus offers high-quality abstractive summaries for each document, known as "highlights," which are authored by the original writers.In addition to these abstractive summaries, extractive summaries (gold standards) are also provided, each containing around 90 to 100 words.These summaries serve as essential references for both qualitative and quantitative assessments of automated summarization methods.The gold standard summaries encompass approximately 10,754 sentences, constituting around 10% of the total number of sentences in the 3,000 texts of the CNN Corpus.Numerous research projects are employing the CNN Corpus, ranging from addressing dangling co-references to enhancing extractive summarization techniques and even generating abstractive summaries from extractive ones.Notably, the CNN Corpus was used in the DocEng'19 Extractive Text Summarization Competition [58,59].This rich dataset plays a crucial role in advancing the field of automatic summarization.
Table 1 offers an overview of the datasets utilized in this study, providing basic information about each corpus.The table showcases details such as the number of clusters, document domains, total document count in each dataset, total sentence count, available test documents, average summary length in terms of words, and the intended task for each corpus.

Evaluation method
We conducted two types of evaluations: quantitative and qualitative.In the quantitative evaluation, we employed state-of-the-art assessment methods to compare our outcomes with the gold-standard models of the articles.The qualitative evaluation aimed to determine the extent to which our generated summaries comprehensively covered the key topics of the research articles.Thus, we evaluated the summaries in terms of user satisfaction.

Quantitative evaluation
In the quantitative evaluation, we measure the similarity between a set of candidate summaries and a collection of reference models (gold standard summaries).This evaluation aims to assess the informativeness of the summaries in terms of their content.To achieve this, we utilize the ROUGE-N metric, which captures various levels of N-gram co-occurrences between candidate summaries and reference models.Notably, ROUGE-1 and ROUGE-2 are well-known ROUGE metrics that compute the overlaps of unigrams and bigrams.Among these metrics, ROUGE-1 recall exhibits the strongest recall ability to identify a better summary within a pair [60,61].ROUGE-N quantifies the n-gram recall between a candidate summary and a set of reference summaries using the following formula: where n stands for the length of the n-gram, gram n , and Count match (gram n ) is the maximum number of n-grams co-occurring in a candidate summary and a set of reference summaries [60,61].Lin [60,61] also demonstrated a strong correlation between ROUGE-1 recall and human judgments.Additionally, we employ ROUGE-SU4, which counts overlapping skip-bigrams between a candidate summary and a reference model, allowing for a maximum gap of four words.Lastly, we use ROUGE-L, which measures the longest common subsequence between two summaries [60,61].(

Qualitative evaluation
In this evaluation, our objective is to measure user satisfaction with the generated summaries.For this purpose, we carried out a qualitative evaluation by inviting ten Englishspeaking individuals to rate our summaries.We adopted the same qualitative evaluation method outlined in [38].To illustrate, while a 3-level scale might include the categories "low," "medium," and "high," a 5-level Likert scale provides varying degrees to gauge agreement on a specific matter, ranging from "strongly agree," "agree," "neither agree nor disagree," "disagree," to "strongly disagree."Specifically, the asked questions are: • Q1: The summary reflects the most important issues of the document.
• Q2: The summary allows the reader to know what the article is about.
• Q3: After reading the original summary provided with the document, the alternative summary is also valid.
Given the diverse lengths of the documents, our evaluation approach focused on utilizing 10 randomly selected documents from each of the tested datasets.

Experiments results and discussion
In this section, we conducted experiments to evaluate different types of EXABSUM summaries.We employed two variations of EXABSUM: (i) EXABSUM Extractive , which generates extractive summaries, and (ii) EXABSUM Abstractive , which generates abstractive summaries.By evaluating both types of summaries, we aimed to assess EXABSUM's ability to extract relevant information and its performance in addressing the abstractive text summarization challenge.Additionally, we aimed to determine whether the strategies employed in EXABSUM are effective in generating summaries across various domains, such as Business, Opinion, Politics, Showbiz, Health, Justice, Living, Sports, Technology, Travel, newswire, and more.We compared the results with those of existing automatic text summarization systems to strengthen our findings.

Parameter value selection and analysis of scoring techniques' suitability
The aim of this evaluation is to appraise the effectiveness of the proposed features for sentence relevance detection.This assessment involves examining these features both individually and in combination, as outlined in the relevant subsections.
The weighting parameter α, as specified in Eq. ( 13), plays a crucial role in determining the relevance of a specific term within a sentence.This parameter governs the weight assigned to both the statistical feature (CHIR) and the semantic feature (SIM) within the hybrid weighting model.To evaluate the impact of each feature on the keyword's relevancy measure TR(w), we conducted multiple trials using different α values (α ∈ {0, 0.2, 0.5, 0.6, 0.8, 1}).Our findings show that the most favorable outcomes are achieved when α = 0.6 is utilized, closely followed by α = 0.5.This observation underscores the significance of combining both statistical and semantic relationships to enhance the overall relevancy determination.
In our experiments, we consistently set the TR-ISF measure parameter α to 0.6.To comprehensively assess the effectiveness of various sentence relevancy scoring techniques, we conducted an ablation study using a backward-like total exclusion procedure.This involved individually excluding or adding the scores from each approach in the weighted averaged model.This evaluation enabled us to achieve three objectives: (1) determining whether the scoring techniques are suitable for enhancing ROUGE scores; (2) identifying their contribution to topic coverage within the document; and (3) gauging the sentence importance.
The ablation technique allowed us to compute ROUGE-1 and ROUGE-2 scores for the DUC01 dataset (Table 2), as well as ROUGE-1, ROUGE-2, ROUGE-SU4, and ROUGE-L scores for the DUC02 dataset (Table 3).Additionally, we presented the ROUGE-1 and ROUGE-2 scores for the CNN dataset in Table 4. Visual representations of the performance of our proposed approach under varied scoring technique values are depicted in Figs. 4, 5, and 6 for the DUC01, DUC02, and CNN datasets, respectively.
In the first experiment, as depicted in Tables 2 and 3, we focused on selecting summary sentences that include relevant keywords and topics determined by the TR-ISF scoring technique, corresponding to the sentence relevance identification stage of EXABSUM Extractive .The TR component pinpoints significant keywords that signify essential topics, while the ISF component gauges a word's descriptiveness.We then compared the resulting combinations to the output of EXABSUM Abstractive .In generating the   summary, we assigned scores to phrases using Eq. 13, and the highest-ranked phrases were utilized to construct the summary.
In the second experiment, we focus on selecting summary sentences based solely on sentence resemblance to the title, sentence length, sentence position, or a combination of these factors.As evident from Tables 2, 3, and 4, a clear trend is observed in most cases with EXABSUM Extractive yielding the best results, particularly when all scoring methodologies are combined (comb3).For instance, the ROUGE-1 results for EXABSUM Extractive with combination 3 show an average improvement of 13.44% compared to the EXABSUM Extractive approach that solely employs the TR-ISF for scoring phrases.
In the case of Combination 2, the same approach yielded an improvement of 11.85% over the results obtained for EXABSUMExtractive using only TR-ISF.
In terms of individual feature analysis, it is noteworthy that summaries generated solely utilizing the TR-ISF scoring technique generally perform well.This could be attributed to the use of a robust scoring technique (incorporating statistical and semantic features) to identify the most relevant terms or topics in a document.However, the results show improvement when the three recommended features are combined in the same approach (Comb3).Consequently, the well-incorporated features are well-suited for the extractive text summarization task, especially in the case of EXABSUM Extractive .The superior ROUGE scores achieved by our system stem not only from the incorporation of TR-ISF and other scoring methodologies but also from the inclusion of the redundancy elimination phase using The Textual Entailment (TE) tool [62].This phase plays a crucial role in generating semantically and syntactically non-redundant summaries.It identifies and removes sentences that are semantically redundant within documents.As a result, sentences with contextual overlap in other sentences can be omitted, leading to improved precision scores and overall system performance.Regarding EXABSUM Abstractive , based on preliminary findings, it is evident that relying solely on graphs and re-ranking based on key approaches does not yield high ROUGE scores, although the results are promising for future research endeavors.The moderate performance of this abstractive technique can be attributed to the constrained summary length of 100 words.Consequently, the selection process for the most significant sentences before or after generating new ones might lead to the omission of certain concepts, impacting the overall performance of the summaries and resulting in lower ROUGE scores.Contrary to common assumptions, longer sentences do not consistently equate to better summaries, nor do shorter sentences guarantee more informative summaries.To address these limitations, a potential approach to enhance the selection of the newly generated summary sentences would involve devising an optimization function to identify the best-performing sentences.One avenue for improving EXABSUM Abstractive could involve leveraging the optimal EXABSUM Extractive combination (Comb3) to achieve this objective.

Qualitative evaluation
Table 5 presents our qualitative evaluation, designed to assess user satisfaction with the produced summaries.When examining the varying percentages of assessed summaries within each category, we observe a moderate number of abstractive summaries that have received agreement compared to extractive summaries evaluated under the same criteria.
The information presented in the summaries generated using EXABSUM Abstractive was assessed positively in contrast to the extractive technique, and in terms of human perception, the abstractive summaries surpass the extractive ones in terms of quality.Additionally, it is noteworthy that the utilization of EXABSUM Abstractive leads to a reduction in the proportion of summaries receiving lower scores (strongly disagree and disagree).Table 6 illustrates an example of two summaries produced by EXABSUM Extractive and EXABSUM Abstractive , respectively.As evident, certain sentences are shared by both summaries, while others have been truncated in the latter."

Comparison to baselines
In this subsection, we will compare the top results achieved by EXABSUM Extractive and EXABSUM Abstractive in generating single-document summaries with the performance of various state-of-the-art summarization techniques.Specifically, we will compare our summarization outcomes with: I. The best-performing participants in the DUC 2001 and 2002 shared tasks.II.The three most successful summarizers identified in a prior evaluation as documented in [63].III.Other approaches, both recent and earlier, that utilized the DUC01 and DUC02 datasets, and evaluated their results using ROUGE-1 and ROUGE-2 metrics.The following subsections provide a brief overview of these approaches: -Parveen and Strube [19] introduced an unsupervised graph-based technique for single-document summarization, which considers three essential summary features: significance, non-redundancy, and local coherence.-Autosummarizer [64] is a web service that generates summaries by segmenting and ranking the most crucial sentences.Its single-document summarization method involves selecting the most pertinent sentences from the source document and has demonstrated superior performance compared to other summarizers in previous evaluations [65].Unfortunately, details regarding the functioning of this system are not available.-Classifier4J [66] is a text summarization and classification toolbox.It performs extractive single-document summarization based on word frequency and constructs the summary from the initial sentences that include any of the top-100 most frequent words in the document.-UnifiedRank [67] is an approach that introduces an innovative unified method for single-document and multi-document summarization simultaneously.It utilizes a graph-based representation along with a unified ranking technique.Mr Bush said that he hoped there would be less carping about the emergency office performance this time adding that the agency took a hit for its reaction to Hurricane Hugo

EXABSUM Abstractive
president bush visiting the california earthquake site this weekend.president bush and his aides flew into a whirlwind of earthquake related activity yesterday morning to get federal help flowing to victims, designed mostly to project image of white house in action, were trying to head off criticism, were accused of responding too slowly after the exxon valdez oil tanker split open in alaskan waters and hurricane hugo struck the carolina coast visited fema headquarters do not want a repeat those charges.the white house is making sure nobody will accuse it of taking crisis lightly -DE [9] is a summarization technique based on sentence clustering.It optimizes the objective function using a discrete Differential Evolution method and similarity, thereby selecting representative sentences from each cluster.-The Fuzzy Evolutionary Optimization Model (FEOM) [68] categorizes sentences based on document content and selects the most significant sentence from each cluster to represent the overall meaning of a document.-NetSum [69] is a method that utilizes the RankNet learning algorithm to train a pairbased sentence ranker.It scores each phrase in a document to determine the most relevant sentences.-Compendium [38]: a text summarization system used to generate two types of generic summaries-extractive and abstractive.It includes the variations COM-PENDIUME and COMPENDIUME-A, where the former focuses on choosing and extracting the most relevant sentences using an extractive approach.The latter, COMPENDIUME-A, combines extractive and abstractive strategies by integrating an information compression and fusion stage to generate abstractive-oriented summaries.-HP-UFPE Functional summarizing (HP-UFPE FS) [70]: A summary system that draws from seventeen extractive summarization methodologies that have garnered substantial attention in the literature, extensively explored in research papers, blogs, and news articles.In this evaluation, the HP-UFPE FS system is utilized, employing the optimal sentence scoring combination for news articles as detailed in [70].-Get To The Point [71]: An abstractive summarization approach featuring coverage and utilizing a hybrid pointer-generator architecture.This technique addresses the challenge faced by conventional abstractive summarization systems on extensive documents, mitigating the generation of repeated and redundant words and phrases.-Fast Abstractive Summarization [36]: Introduces a precise and efficient summarization model that initially selects important sentences and subsequently rewrites them in an abstractive manner-compressing and paraphrasing-to generate a concise final summary.The method employs a novel sentence-level policy gradient technique to connect the non-differentiable calculation between these two neural networks while maintaining linguistic fluency.
Tables 7, 8, and 9 provide a comparison between the top-performing configurations determined during our experiments and the summarizers mentioned earlier, focusing on the ROUGE-1, ROUGE-2, DUC 2001, DUC 2002, and CNN collections, respectively.Beginning with the DUC 2001 dataset, our systems (EXABSUM Extractive and EXABSUM Abstractive ) surpass the DE and FEOM systems in terms of both ROUGE-1 and ROUGE-2 scores (see Fig. 7).
Upon analyzing the feature weights derived from DE and FEOM, it becomes evident that both methods employ semantic features to ascertain the significance of sentences.This suggests that semantic techniques play a substantial role in the text summarization process.System T, which stands as the top-performing participant in the DUC 2001 competition, achieved superior ROUGE-2 results.However, it's worth noting that its performance is statistically similar to the outcomes produced by DE, FEOM, and Clas-sifier4J (a supervised approach).On the DUC 2002 dataset, the top three performing systems are EXABSUM Extractive , Parveen, and Strube [19] (See Fig. 8).It is worth noting that the Parveen and Strube [19] approach treats summarization as an optimization task, with an optimization step used to verify non-redundancy and local coherence in the resulting summaries.As expected, incorporating coherence and redundancy elimination approaches improves ATS performance.Despite the fact that the DUC2001 and DUC2002 contests have been running for a decade, the System T and System 28 still produce competitive results when compared to certain current summarizers.In contrast, while using deep learning methodologies, the 'get to the point' and 'rapid abstractive  Regarding the CNN dataset, once more, EXABSUM Extractive emerges as the top performer in terms of ROUGE-1 and ROUGE-2 scores (see Fig. 9).Statistically, it surpasses all other systems, showcasing a remarkable 34.12% enhancement over the best-performing system.
Overall, our two automatic text summarization (ATS) techniques, namely EXABSUM Extractive and EXABSUM Abstractive , demonstrate effectiveness in extractive and abstractive document summarization, respectively.They stand on par with other state-of-the-art text summarization tools, both extractive and abstractive.However, it's important to note that EXABSUM Abstractive falls short compared to EXABSUM Extractive .The employment of EXABSUM Extractive results in higher ROUGE-1 scores when compared to other techniques.The performance of EXABSUM Abstractive lags behind EXABSUM Extractive across most metrics, as it solely relies on the input content and lacks the enhancement provided by the most relevant extractive sentences produced by EXABSUM Extractive .
As mentioned earlier, the ROUGE evaluation relies on exact matches of text fragments when comparing system-generated summaries to human-produced ones (extracts).Consequently, if abstractive information is integrated with the extractive output summary in a hybrid model, it's possible that the F-measure results could significantly improve compared to the initial extract.This insight suggests that further research could be conducted into these types of summaries, aiming to enhance their quality beyond the mere selection of sentences.
In our specific approach to creating and testing EXABSUM Abstractive , we employed two methods: (1) We initiated the process by crafting abstractive summaries based on sentences identified as relevant during the sentence relevance stage.These summaries underwent compression or merging of information, followed by reranking.This approach led to resulting summaries that were shorter than the extracts produced by EXABSUM Extractive .Since no additional information was introduced, the recall value was consistently lower than that of EXABSUM Extractive .(2) For the second technique, we utilized EXABSUM Abstractive to generate new sentences from the source document, starting with the first word of each sentence.Ultimately, we opted for this latter technique, as it appeared more suitable for producing more accurate summaries.However, it's important to note that further research efforts are required to enhance EXABSUM Abstractive , including exploring techniques such as rephrasing and embedding.
Based on the ROUGE evaluation results presented in Tables 7, 8, and 9, it's evident that the proposed summarization approach, EXABSUM, demonstrates strong performance across its two variants: EXABSUM Extractive and EXABSUM Abstractive .These findings underscore the significance of both types of summaries, as they collectively contribute to the creation of informative summaries, ultimately enhancing the performance of the text summarization task.Additionally, our proposed approach for automatic text summarization effectively aids in selecting sentences that are not only informative but also grammatically correct and semantically relevant to the text.

Fig. 1
Fig. 1 EXABSUM Extractive stages for extractive summary.a preprocessing (surface linguistic analysis); b redundancy elimination; c sentence relevance; and d summary generation

1 .
The president of U.S. Donald Trump visited Venezuela last Thursday.2. Donald Trump did a visit to the People Republic of Venezuela on Thursday.3.Last week the President of State M. Trump visited Venezuela officials.4. Donald Trump wanted to visit Venezuela last month but suspended his arrangements till Thursday last week.

2 Fig. 6
Fig.6ROUGE-1 and ROUGE-2 results of our proposed approach for the large CNN dataset while varying the scoring techniques and summary types

Fig. 8 Fig. 9
Fig. 8 ROUGE-1 and ROUGE-2 results of EXABSUMExtractive and EXABSUMAbstractive compared to various baseline systems on the DUC 2002 dataset [58]2001 and DUC 2002:these datasets are widely used in ATS tasks and were provided by the Document Understanding Collection (DUC) and Text Analytics Conferences (TAC).DUC 2001 consists of 309 English news articles, each accompanied by two separate golden summaries prepared by different individuals.DUC 2002 contains 567 news articles in English, covering various topics and lengths, and also includes two gold-standard summaries.The length of the accompanying summaries for both datasets is approximately 100 words.Notably, the DUC collections are sentence-divided to identify the most informative sentences.The DUC datasets are organized into different categories, including biography, politics, law, society, culture, business, health, natural disasters, science, sports, and international topics.Certain categories like 'Natural Disaster, ' 'Politics and History, ' and 'Natural Disaster' constitute a significant portion of DUC 2002, making up about 60% of the documents in these categories.All DUC publications and clusters include human-generated summaries of approximately 100 words.-CNNCorpus[58]isa substantial collection of news documents used for single-docu- ment summarization tasks, sourced from CNN's website (http:// www.cnn.com).This corpus stands as the largest available dataset for single-document extractive summarization.It comprises 3,000 English articles that are grouped into twelve subject categories, as originally categorized by CNN: Business, Opinion, Politics, Showbiz, Health, Justice, Living, Sports, World News, Technology, United States, and Travel.

Table 1
Statistics of the CNN and DUC datasets

Table 2
ROUGE results for EXABSUM Extractive and EXABSUM Abstractive in Feature Analysis of the DUC2001 Dataset: Comb Represents the Combination of Selected Scoring Approaches

Table 3
ROUGE results for EXABSUMExtractive on DUC2002 dataset: analysis of features with comb as the combined scoring techniques

Table 4
ROUGE results for EXABSUM Extractive and EXABSUM Abstractive on CNN dataset while analyzing their features, Comb denotes the combination of the selected scoring techniques

Table 5
Results of user satisfaction for various text summarization approaches

Table 6
Example summaries generated by EXABSUMExtractive and EXABSUMAbstractive for Document WSJ891019-0021 (DUC 2002 Corpus, Cluster d062j) with 50% ratio of original text EXABSUM ExtractiveThe White House is making sure nobody will accuse it of taking this crisis lightly President Bush and his aides flew into a whirlwind of earthquake related activity yesterday morning Mr Bush and his aides were accused of responding too slowly after the Exxon Valdez oil tanker split open in Alaskan waters and Hurricane Hugo struck the Carolina coast Mr. Bush got his first earthquake briefing of the day at 6:30 a.m

Table 7 F
-measure comparison: our proposed techniques vs. baselines for single-document summarization on the DUC 2001 collectionThe bold values emphasize the superior significance of our approach compared to others

Table 8
Comparison of F-measure results between our proposed techniques and the baseline methods on the DUC 2002 collection for the single-document summarization task

Table 9
F-measure results of our proposed approaches compared to baselines on the CNN dataset for single-document summarization task