A sample decreasing threshold greedy-based algorithm for big data summarisation

As the scale of datasets used for big data applications expands rapidly, there have been increased efforts to develop faster algorithms. This paper addresses big data summarisation problems using the submodular maximisation approach and proposes an efficient algorithm for maximising general non-negative submodular objective functions subject to k-extendible system constraints. Leveraging a random sampling process and a decreasing threshold strategy, this work proposes an algorithm, named Sample Decreasing Threshold Greedy (SDTG). The proposed algorithm obtains an expected approximation guarantee of 11+k-ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{1}{1+k}-\epsilon $$\end{document} for maximising monotone submodular functions and of k(1+k)2-ϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{k}{(1+k)^2}-\epsilon $$\end{document} in non-monotone cases with expected computational complexity of On(1+k)ϵlnrϵ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O\left(\frac{n}{(1+k)\epsilon }\ln \frac{r}{\epsilon }\right)$$\end{document}. Here, r is the largest size of feasible solutions, and ϵ∈0,11+k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\epsilon \in \left(0, \frac{1}{1+k}\right)$$\end{document} is an adjustable designing parameter for the trade-off between the approximation ratio and the computational complexity. The performance of the proposed algorithm is validated and compared with that of benchmark algorithms through experiments with a movie recommendation system based on a real database.

selected, namely diminishing returns [18].It is well known that the greedy-related algorithms are efficient and can provide an approximation guarantee for maximising submodular functions [19].Hence, the big data summarisation problem can be handled as maximising a submodular function based on a large-scale dataset, meanwhile satisfying a certain constraint or a combination of several constraints [2].
This paper addresses big data summarisation problems using the submodular maximisation approach, especially subject to k-extendible system constraints.Note that the k-extendible system constraint is a general type of constraint that has been widely studied.The concept of k-extendible systems was first introduced by Mestre in 2006 [20].The intersection of k matroids based on the same ground set is always k-extendible [20].Many types of constraints handled in submodular maximisation problems fall into the k-extendible system constraint, such as the cardinality constraint, partition matroid constraint, and k-matroid constraint.
The issue is that finding the optimal solution of submodular maximisation is NPhard, and the sizes of datasets tend to increase.NP-hard problems are known to significantly suffer from "curse of dimensionality", which implies that the complexity of the problem explodes as the problem size increases.Therefore, the trend of increasing sizes of datasets combined with the NP-hardness of the problem urges the development of more computationally efficient optimisation algorithms.The Sample Greedy algorithm (Sample, for short) proposed in [21] is one of the state-of-the-art algorithms for constrained submodular maximisation problems.Specifically, Sample [21] was the fastest algorithm (before this work) for maximising non-monotone submodular functions subject to a k-extendible system constraint.
Inspired by the sampling strategy from [21] and a decreasing threshold idea from [22], this work proposes an algorithm that is even faster than Sample [21].The proposed algorithm, which is named as Sample Decreasing Threshold Greedy (SDTG), provides an expected approximation guarantee of p − ǫ for maximising monotone submodular functions and of p(1 − p) − ǫ for non-monotone cases with expected time complexity of only O( pn ǫ ln r ǫ ) , where p ∈ (0, 1 1+k ] is the sampling probability and ǫ ∈ (0, p) is the threshold decreasing parameter.If the sampling probability p is set as 1 1+k , then SDTG provides the best approximation ratios for both monotone and nonmonotone submodular functions which are 1 1+k − ǫ and k (1+k) 2 − ǫ , respectively.Here, ǫ acts as a design parameter for the trade-off between the approximation ratio and the computational complexity.The proposed algorithm is validated through experiments with a movie recommendation system based on the MovieLens [23] which is a widely used real movie information database.Experimental results demonstrate that the proposed algorithm outperforms benchmark algorithms in terms of both solution quality and computation efficiency.The main contributions of this work are summarised as follows: • This work proposes the current fastest algorithm, SDTG, for maximising nonmonotone submodular functions subject to k-extendible system constraints; • Precise mathematical proofs are provided for analysing the theoretical guarantees of the proposed algorithm; • Experiments with a movie recommendation system based on a real database are carried out to reveal the practical performance of SDTG for solving the big data summarisation problem.
The rest part of this work is organised as follows."Related works" section investigates related articles for constrained submodular maximisation problems.In "Preliminaries" section, some basic knowledge related to the proposed algorithm is presented."Algorithm and analysis" section demonstrates the proposed algorithm and analyses its theoretical performance in detail.The performance and validity of the theoretical results are then testified through experiments with a movie recommendation system in "Experiments" section."Conclusions" section offers the conclusions of this paper and possible future research directions.

Related works
There have been numerous works recently carried out to develop more efficient constrained submodular maximisation algorithms, and many of them endeavour to increase computational efficiency even by sacrificing some degree of approximation ratio.These works are classified by the types of constraints, and their developments are summarised in the following.

Cardinality constraint
The Sieve-Streaming proposed by Badanidiyuru et al. [12] is the first single-pass streaming algorithm for maximising monotone submodular functions, achieving approximation guarantee of 1/2 − ǫ with computational complexity of O( n ǫ log r) .Here, n is the size of the ground set, r is the size of the largest feasible solution.Norouzi-Fard et al. [9] proposed another single-pass algorithm Salsa that improved the approximation guarantee to a value better than 1/2.They also extended their work to a multi-pass algorithm P-Pass that provided the trade-off between the approximation ratio and the number of passes.The Decreasing Threshold Greedy proposed in [22] obtained an approximation ratio of 1 − 1/e − ǫ with time complexity of O( n ǫ log n ǫ ) for monotone submodular func- tions.This is the first streaming algorithm whose computational complexity is independent of r.Later, the sampling-based Stochastic Greedy proposed by Mirzasoleiman et al. [24] achieved an expectantly the same approximation ratio with lower time complexity of O(n log 1 ǫ ) , compared with the Decreasing Threshold Greedy [22].The Stochas- tic Greedy gets orders of magnitudes faster by losing only a bit of approximation ratio compared with other benchmark algorithms.Then Buchbinder et al. [25] extended the Stochastic Greedy to general non-monotone cases and achieved an approximation guarantee of 1/e − ǫ with computational complexity of O( n ǫ 2 log 1 ǫ ) .Recently, Breuer et al. [26] proposed an efficient algorithm Fast for the monotone case, using the adaptive sequencing technique.Fast achieves an approximation ratio of 1 − 1/e − ǫ , with O(n log log r) queries.

Matroid constraint
The original greedy algorithm (Greedy) [19] provides an approximation ratio of 1/2 with time complexity of O(nr) for monotone submodular maximisation.Nemhauser and Wolsely [27] proved that no algorithm can achieve an approximation ratio better than 1 − 1/e with polynomial time complexity.The continuous greedy based on the multilin- ear extension was utilised to achieve an approximation ratio of 1 − 1/e [28].The meas- ured continuous greedy algorithm developed by Feldman et al. [29] achieved a (1 − 1/e) -approximation for the monotone case and a 1/e-approximation for the non-monotone case.This is the first algorithm to provide a constant factor of approximation for maximising non-monotone submodular functions subject to a partition matroid constraint.However, the sophisticated continuous algorithms are inherently too time-consuming to be applied directly in the real world [30].To remedy this, the idea of decreasing threshold [22] was adapted to reduce the computational complexity [31].Badanidiyuru and Vondrak [22] proposed a new variant of the continuous greedy algorithm and achieved an approximation ratio of 1 − 1/e − ǫ with complexity of O( nr ǫ 4 log 2 r ǫ ) for monotone sub- modular functions.Then, a close variant of the Decreasing Threshold Greedy described in [25] provided an approximation ratio of 1/2 − ǫ with computational complexity of O( n ǫ log r ǫ ) for the monotone case.

k-extendible system constraint
It is known that Greedy [19] achieves a 1 1+k -approximation for maximising monotone submodular functions subject to a k-extendible system constraint.The Decreasing Threshold Greedy [22] provides a slightly worse approximation guarantee of 1 1+k+ǫ but requires lower computational complexity of O( n ǫ 2 log 2 n ǫ ) than Greedy [19] does for max- imising monotone submodular functions.For the non-monotone case, Gupta et al. [32] proposed an algorithm achieving an approximation ratio of k (k+1)(3k+3) with time complexity of O(nrk).Then, the approximation ratio was improved to k (k+1)(2k+1) by an algorithm called Fantom proposed by Mirzasoleiman et al. [5] with the same complexity.After this, Feldman et al. [21] made a significant breakthrough in terms of both approximation ratio and time complexity.The Sample algorithm proposed in [21] achieved an approximation ratio of k (k+1) 2 with complexity of O(n + nr/k) .Experiments based on a movie recommendation system in [21] confirmed that Sample outperformed Fantom in terms of computational efficiency.
In summary, gradual improvements have been made for solving the constrained submodular maximisation problems recently.However, the rapid expansion in the scale of modern datasets urges persistent developments for faster algorithms.An immediate research question would be whether or not one can develop an algorithm that can further improve the efficiency of maximising general non-negative submodular functions especially subject to k-extendible system constraints.

Preliminaries
This section presents some necessary definitions and basic concepts related to the proposed algorithm.The definitions and concepts can also be found in our previous works [33][34][35].
where N is named as "ground set" which is a finite set containing all elements.Equiva- lently, ∀ A ⊆ B ⊆ N and u ∈ N − B, Definition 2 (Marginal gain value [36] (mgv)) For a set function f : 2 N → R , a set S ⊆ N , and an element u ∈ N , the marginal gain value of f at S with respect to u is defined as where .
= means equal by definition.This work denotes the marginal gain value as "mgv" for tidiness.
The inequality (1) is known as the diminishing return, which is a crucial property of submodular functions: the mgv of a given element will never increase as more elements have already been selected.One intuitive example for the submodularity is the sensor placement problem: The space coverage increment obtained by adding an extra fire detector to a particular position of a room will never increase as more detectors have already been placed in the room.

Definition 3 (Monotonicity
The submodular objective functions considered in this paper are normalised (i.e.f (∅) = 0 ), non-negative (i.e.f (S) ≥ 0 , ∀S ⊆ N ), and can be either monotone or non-monotone.Definition 4 (Matroid [22]) A matroid is a pair M = (N , I) where N is the ground set, and I ⊆ 2 N is a collection of independent sets, satisfying: Specifically, matroid constraints include uniform matroid constraints and partition matroid constraints.The uniform matroid constraint is also called cardinality constraint, which is a special case of matroid constraints where any subset S ⊆ N satisfy- ing |S| ≤ r is independent, i.e. S ∈ I .The partition matroid constraint means that an independent subset S can contain at most a certain number of elements from each of the disjoint partitions of N .
A typical example for the partition matroid constraint is the security camera system: Each camera of the system can only point to one of its admissible directions at a certain moment.The partition matroid constraint is a special case of k-extendible system constraints where k equals to 1.A formal definition of the k-extendible system constraint is given following an auxiliary concept. (1) Definition 5 (Extension [21]) If an independent set B strictly contains an independent set A, then B is called an extension of A.
Definition 6 (k-extendible system [20]) A k-extendible system is an independence system (N , I) that for every independent set A ∈ I , an extension B of A, and an ele- ment u / ∈ A , A ∪ {u} ∈ I , there exists a subset Intuitively, if an element u is added into an independent set A of a k-extendible system, it requires at most k other elements to be removed from A in order to keep the set independent [21].For example, a certain user of a movie recommendation system likes three genres of movies: Action, Adventure, and Sci-Fi.Suppose that this user wants at most one movie from each of these three genres.Note that a movie can belong to multiple genres.Here are four movies with genre information: mv 1 (Action), mv 2 (Adventure), mv 3 (Sci-Fi), and mv 4 (Action, Adventure, Sci-Fi).According to the requirement from the user, a recommendation list S = {mv 1 , mv 2 , mv 3 } is independent, i.e., S ∈ I ; adding mv 4 to S will make it dependent.Movies mv 1 , mv 2 , and mv 3 must be removed from S to keep it independent if mv 4 is remained in S. Therefore, the constraint in this example is a 3-extendible system constraint.
The following is an important claim that provides the mathematical foundation for Sample [21] to work well in non-monotone submodular maximisation.Readers are referred to [37] for the proof of Claim 1.
Claim 1 (Due to [37]) Let h : 2 N → R ≥0 be a submodular function, and let S be a ran- dom subset of N .

Algorithm and analysis
This section describes SDTG in Algorithm 1 and analyses its theoretical performance in detail.Note that the proposed algorithm is based on submodular optimisation like in our previous studies [33][34][35].Hence the analysis shares some essences of logic in our previous works.An equivalent version of Algorithm 1 is introduced as Algorithm 2 to better analyse SDTG.

Algorithm
This work proposes to leverage the sampling strategy [21] and develop a variant of decreasing threshold idea to design a summarisation algorithm.On the one hand, the random sampling at the beginning of SDTG can help the algorithm to avoid getting trapped in local optima.It can also help to accelerate the algorithm because only a small portion of elements from the ground set is considered.On the other hand, the decreasing threshold can further accelerate the algorithm.Note that Greedy [19] needs to reevaluate all the remaining elements to find the best one during each iteration.In contrast, SDTG searches for a relatively good element whose mgv is no less than the current threshold instead of looking for the best one.Therefore, SDTG does not have to reevaluate all remaining elements every time before selecting an extra element.
Some notations from Algorithm 1 are stated in the following: N is the ground set containing all elements.I is the collection of all feasible sets (independent); r is the maximum cardinality of feasible sets in I ; p is the sampling probability (uniform dis- tribution); ǫ is the threshold decreasing parameter determining the decreasing speed of the threshold; S is the solution set containing the selected elements; R is a set containing the remaining sampled elements; θ is the decreasing threshold.
The structure of Algorithm 1 consists of two phases.The first phase (lines 1-4) is sampling where elements are randomly selected from the ground set N with prob- ability p to form a sample set R. The probability distribution of sampling is uniform.The second phase (lines 5-22) is selecting where an independent solution set S is selected from R using decreasing threshold greedy.The initial threshold is set as the largest mgv given the empty set and denoted as d (line 5).The terminal threshold is set as ǫ r d (line 6).The reason for choosing this value as the termination condition will be given later in the proof part.

Algorithm 1 SDTG
R ← R ∪ {u} with probability p // Random sampling.4: end for 5: d ← maxu∈R ∆f (u|S) 6: for for u ∈ R do end for 21: end for 22: return S More details of the second phase are given in the following.One loop of the inner "for" loops is named as one iteration.At the beginning of each iteration, SDTG checks independency of S ∪ {u} .If it is not independent, then remove element u from R (lines 8-9).Otherwise, calculate the mgv of u and compare it with the current threshold θ .If the mgv of u is greater than or equals to θ , then add u to S and remove it from R (lines 11-13).An element u is named as a qualified element if the mgv of u given S is no less than the current threshold θ .If the mgv of an element is already less than ǫ r d , it will never become greater or equal to ǫ r d in subsequent iterations due to submodu- larity.Therefore, this element can be removed from R immediately, as stated in lines 15-17.Note that each element in R will be evaluated only for one time under one threshold.If the mgv of an element is between ǫ r d and θ , this element will remain in R for the next outer loop where the threshold will decrease.The remaining elements in R will be reevaluated and their updated mgvs will be compared with a decreased new threshold.The threshold keeps decreasing after all remaining elements in R have been evaluated until reaching the termination condition.

Analysis
To better analyse the theoretical approximation performance of Algorithm 1, this work leverages some analysing techniques that were used in [21].A few auxiliary variables have been introduced to transform SDTG to an equivalent version, i.e., Algorithm 2.

Algorithm 2 Equivalent SDTG
Ns ← Ns ∪ {u} with probability p 5: end for 6: d ← max u∈Ns ∆f (u|S) for u ∈ R do if ∆f (u|S) ≥ θ then  end for 36: end for 37: return S In Algorithm 2, variables C, S c , Q, and K c are introduced only for the convenience of analysis and have no effect on the final output S. Therefore, Algorithm 2 and Algorithm 1 are equivalent in terms of solution quality.The rules of these variables are as follows.
C is a set that contains all considered elements that have mgvs greater or equal to the threshold θ in a certain iteration of Algorithm 2 no matter whether they are added into S or not.
S c is a set that contains the selected elements at the beginning of the current iteration.At the end of this iteration, S = S c ∪ {c} if c is added into S and Q, otherwise S equals to S c .
Q is a set that bridges the relationship between the solution S and the optimal solution OPT.Q starts at OPT at the beginning of the algorithm and changes over time.Note that, Q is introduced only for analysis and there is no need to know the exact value of Q or OPT.In each iteration, the element added into S is also added into Q.At the same time, a set K c is removed from Q to keep the independence of Q if an ele- ment c is added into Q.Note that, if an element c is already in Q and is considered but not added into S at the current iteration, then this element c should be removed from Q.
K c is a set that is introduced to keep Q independent and help Q to remove c that is not added to S. According to the property of k-extendible systems, Algorithm 2 is able to remove a set K c ⊆ Q − S which contains at most k elements from Q if an element is added into the currently independent set Q.In addition, if c is not added to S and c ∈ Q at the beginning of some iteration, then K c = {c}.
The theoretical performance of the proposed algorithm SDTG is summarised in Theorem 1.

Theorem 1 SDTG achieves an approximation guarantee of at least 1
1+k − ǫ for maxim- ising monotone submodular functions subject to k-extendible system constraints and of k (1+k) 2 − ǫ for non-monotone cases with computational complexity of O( n (1+k)ǫ ln r ǫ ), where n sis the size of the ground set, r is the largest size of a feasible solution, and ǫ ∈ (0, 1 1+k ) is the threshold decreasing parameter.
The computational complexity can be easily proved.Assume that there are in total x number of loops in the outer "for" loop of Algorithm 1.Thus,

Solving the above equation yields
There are expectantly at most p • n function evaluations in each outer loop.Therefore, the time complexity of Algorithm 1 is O( pn ǫ ln r ǫ ).The following part of this section analyses the approximation ratios of SDTG in both monotone and non-monotone cases through Algorithm 2.
Proof According to Algorithm 2, at the end of each iteration, the set Q is independent i.e.Q ∈ I .S is a subset of Q, i.e. S ⊆ Q , as every element c that is added to S is also in Q.Therefore, S ∪ {q} ∈ I ∀q ∈ Q − S by the property of independent systems and |Q − S| ≤ r .At the termination of Algorithm 2, �f (q|S) < ǫ r d ∀q ∈ Q − S and f (S) ≥ d .Thus, The result is clear by rearranging the above inequality.
Remark 1 Lemma 1 indicates that, at the termination of Algorithm 2, f(S) gets close to f(Q) if ǫ is small enough.This means that if the mgv of an element is less than ǫ r d , then this element can be considered negligible because it has very limited contribution to f(S).This is the reason why the terminal threshold is set as ǫ r d.
Proof There are three cases to analyse, depending on whether the current element u is considered at some point of iteration, i.e. u ∈ C , and whether u is already in Q at the beginning of the iteration in Algorithm 2. Note that the size of K u is kept as small as possible.
i.If u / ∈ C for whole iterations, K u = ∅ and thus the expectation is obtained as: ii.If u ∈ C and u ∈ Q at the beginning of the iteration, then K u = ∅ for u ∈ N s and K u = {u} for u / ∈ N s .Since u is sampled in N s with probability p, the expectation is obtained as: iii.If u ∈ C and u / ∈ Q at the beginning of the iteration, then K u contains at most k ele- ments for u ∈ N s , and K u = ∅ for u / ∈ N s .According to the property of k-extend- ible systems, if Q becomes dependent after adding u, then Q can remove at most (submodularity) k elements to remain independence.If Q is still independent after adding u, then K u = ∅ .Therefore, Proof Let us define a random variable G u such that its value is equal to the increase of f(S) when u ∈ N is considered, i.e.
Note that since f is assumed to be normalised, f (∅) = 0 .Given the event E u specifying all the decisions made before considering u, the conditional expectation of G u is obtained as u is defined as S u given the event E u .Note that if u is sampled but not in C, �f (u|S ′ u ) is defined as 0 by convention.Otherwise if u is not sampled, G u is zero.Hence, the condi- tional expectation of G u is: By the law of total expectation, the expectation of G u is obtained as: Hence, the expectation of f(S) is obtained as: Proof In a certain iteration and given the current threshold θ , if u ∈ C it implies that While if an element q ∈ K u − S was not selected before this iteration, then Combining Eqs. ( 2) and ( 3) yields (2) �f (u|S u ) ≥ θ.

let us define a new submodular and non-monotone function
Eqs. ( 7) and (8) show that, for p ∈ ( 1 1+k , 1] , the expected approximation ratio becomes stagnated in the monotone case and decreasing in the non-monotone case.Moreover, the computational complexity increases as the sampling probability gets larger.On the other side, for p ∈ (0, 1 1+k ] , the sampling probability provides adjust- ment capability for the trade-off between the approximation ratio and computational complexity.As the probability increases for p ∈ (0, 1 1+k ] , the expected approxima- tion ratios improve for both monotone and non-monotone cases, but the computational complexity also increases.

Recall that the theoretical time complexity is O pn
ǫ ln r ǫ .The impact of ǫ on the solution quality and time complexity is more desirable than that of p.Therefore, this work fixes the sampling probability as p = 1 1+k and leave ǫ as an adjustable designing parameter for the trade-off of solution quality versus time complexity.According to Eqs. ( 7) and ( 8), the best expected approximation ratios can be readily obtained, when p = 1 1+k , as:

Experiments
This section testifies the proposed algorithm SDTG through experiments using a real database and compares its performance with that of Greedy [19] and Sample [21].For a fair comparison, this section uses the basic versions of these algorithms without integrating the Lazy strategy [38].Note that the performance of Sample and Fantom [5] has already been compared in [21].

Experimental setup
The database used in the experiments is MovieLens 20M [23].This database contains 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users.Movies in the database are classified into 19 genres, such as Action, Comedy, Drama, etc. Besides, each movie is also scored according to the relevance with 1128 genome tags forming 12 million relevance scores in total.
The objective of the movie recommendation system in the experiments is to select a shortlist of movies that are representative yet diverse for users based on their favourite movie genres.The objective function is introduced from [5,21].Let N be the set of all movies and G be the set of all movie genres.Denote N (g) as the set of all movies that belong to the movie genre g ∈ G .Denote G(i) as the set of genres that the movie i belongs to.Note that one movie can belong to different genres, hence |G(i)| ≥ 1 .Let s ij represent the similarity between movie i and movie j.Denote G µ as the set of all movie genres that the user µ likes, G µ ⊆ G .The movies that can be considered by the user µ is contained in the set N µ = ∪ g∈G µ N (g) .The objective function of movie recommenda- tion for user µ is given by where ∈ [0, 1] is the penalty parameter for the similarity between movies within the recommendation list S. The objective function Eq. ( 9) is non-negative, non-monotone, and submodular.The first term of Eq. ( 9) reflects the representativeness of the selected movies, and the second term helps to increase diversity.It is desired to achieve high objective function value with low computational complexity.The similarity value between movie i and movie j can be calculated based on the Euclidean distance of relevance scores where N t = 1128 is the number of all genome tags, γ i t and γ j t are the relevance scores in terms of the tag t for movie i and movie j, respectively.The calculation of the similarity map took around 35 days on Cranfield HPC-Delta, 1 using 128 CPUs with parallel computing.
The constraints of the movie recommendation system come from the upper limits of the number of movies in total and in each movie genre.The first constraint is an upper limit m on the total number of movies in the movie recommendation list for the user.The second one is an upper limit m g (named as a genre limit) on the number of movies that belong to the movie genre g.According to [21], the movie recommendation system is subject to a |G µ |-extendible system constraint.
In the experiments, suppose that the user's favourite movie genres are Action, Adventure, and Sci-Fi.Then, the constraint of the movie recommendation system is a 3-extendible system constraint.Movies with ids less than 30,000 are within consideration since not all movies have genome scores in the database.Set the upper limit on the total number of movies as m = 15 , and the genre limit as varying numbers from 1 to 6. Set the sampling probability for Sample and SDTG as p = 0.25 , and the threshold decreasing parameter for SDTG as ǫ = 0.2 .Set the penalty parameter as = 0.8 .Denoted Max Sample (4) and Max SDTG (4) as the best selections from 4 rounds of Sample and SDTG, respectively.The results of Sample and SDTG are based on 100 rounds of these two algorithms.The running time for these algorithms is measured as the number of objective function evaluations which is independent on the computer conditions.Note that, the experimental results for Sample and SDTG vary somehow each time as the algorithms are related to random sampling.(9) 1 Please refer to https ://www.cranfield.ac.uk/study /it-servi ces for details about Delta.Accessed 15 Dec 2020.

Results
The performance of SDTG is compared with that of benchmark algorithms in terms of both function values and running time in Fig.And Greedy is a deterministic algorithm.Therefore, these three items do not appear in Fig. 1c, d that are for demonstrating the distribution resulted from random sampling.Overall, the function value distribution of SDTG has similar spreads with Sample's, but SDTG achieves higher median values than Sample does.In terms of running time, SDTG has significantly smaller spreads and lower median values than Sample does.The comparison between Sample and SDTG indicates that SDTG not only achieves better function values but also is faster and more reliable.
Figure 1e, f demonstrate the ratio comparison of the solution quality and running time of different algorithms.The performance of Max Sample ( 4) is set as a baseline for other algorithms in comparison.When m g = 2 , Max SDTG (4) achieves a significantly bet- ter function value but consumes fewer function evaluations than Max Sample (4) does.While m g = 5 , Max SDTG (4) achieves a much better function value (38.4% higher) and consumes a dramatically smaller number of function evaluations (76.1% fewer).On average, SDTG finds better solutions but only consumes 6.1% of function evaluations compared with Max Sample (4).In both cases, Greedy is the least competitive one among all algorithms because it achieves the worst function values and requires the second largest number of function evaluations.SDTG provides high-quality solutions yet consumes the fewest function evaluations, which is of great advantage when handling large-scale datasets.

Discussion
The reason why Greedy performs poorly in terms of solution quality is that it greedily selects the best element during each iteration heading to bad local optima.On the other side, with the help of the sampling process, Sample and SDTG related algorithms are able to avoid those elements that can get the algorithms trapped in bad local optima.The threshold in SDTG can further help the algorithm to avoid those local optima.This is why SDTG practically outperforms Sample in terms of solution quality.Table 1 explains the reason in detail.According to the definition of the genre limit constraint, at most two movies can be selected from each genre of Adventure, Action, and Sci-Fi when m g = 2 .The maximum number of movies without violat- ing the aforementioned constraint is six.Greedy only recommends three movies and reaches the upper genre limit.However, Max Sample (4) and Max SDTG (4) are able to recommend five and six movies, respectively, which better fit the objective of the movie recommendation system.The reason why Greedy performs poorly in terms of running time is that it has to calculate the mgvs of all remaining elements given the current selection to find the best one.Sample is faster than Greedy because it only considers a small portion of the ground set, although it also needs to evaluate all remaining elements in the sample set.Different from Sample, SDTG can stop evaluating once it finds one qualified element and adds this element to the selection set immediately.This means that SDTG does not have to evaluate all the remaining elements in the sample set in order to select an extra element.Therefore, SDTG consumes fewer function evaluations than Sample does on average.In addition, the running time of Sample is highly dependent on the size of the sample set because it needs to evaluate all elements in the sample set.In contrast, SDTG can usually find a qualified element from the front positions of the sample set and stop evaluating.Therefore, the running time of SDTG is less related to the size of the sample set compared with Sample's.This is the reason why the spread of running time distribution of SDTG is smaller than Sample's.

Trade-off of solution quality vs. running time
This section also examines the impact of the threshold parameter ǫ on solution qual- ity and running time.This will help us to choose a desirable value of ǫ and to have a a Function Value Distribution b Running Time Distribution Fig. 2 The effect of ǫ on function value and running time deeper comprehension of SDTG.The value of ǫ varies from 0.04 to 0.24 with a step of 0.04.Two cases are checked where m g equals to 2 and 5, respectively.Other settings are as same as previous ones.We run 100 rounds of SDTG and record the function values and the number of function evaluations in each round.Figure 2 demonstrates the experimental results with varying values of the threshold decreasing parameter.The distributions of function value and running time are illustrated in Fig. 2a, b, respectively.Figure 2a shows that the impact of changing ǫ on func- tion values is not significant.Function values fluctuate slightly when ǫ ≥ 0.08 .However, the solution quality for both m g = 2 and m g = 5 is obviously worse when ǫ equals to 0.04 than that with larger values of ǫ .This is because the threshold decreases very slowly with an extremely small ǫ .In this case, the mgv of the element selected by SDTG in each itera- tion is very close to the largest one.As mentioned before, the decreasing threshold can also help SDTG to avoid local optima.An extremely small ǫ makes SDTG close to Sam- ple, which weakens the advantage of the decreasing threshold.Figure 2b shows that the median values of running time decrease obviously as ǫ increases.The spreads of running time also become smaller as ǫ goes up.The reason is that the threshold decreases faster with a larger ǫ .When evaluating the mgvs of the remaining elements one by one, SDTG can find a qualified element more quickly with a smaller threshold.The running time of SDTG also becomes less dependent on the size of the sample set.

Conclusions
This paper has presented an efficient algorithm, Sample Decreasing Threshold Greedy (SDTG), to deal with big data summarisation problems.The proposed algorithm achieves an expected approximation ratio of k (1+k) 2 − ǫ for maximising general non- monotone submodular objective functions subject to k-extendible system constraints with only O n (1+k)ǫ ln r ǫ value oracle calls.The performance of SDTG is testified and compared with that of benchmark algorithms through experiments with a movie recommendation system based on a widely-used movie information database.The experimental results indicate that the proposed algorithm has great application potentials in large-scale discrete optimisation problems where the sizes of datasets are enormous such as the applications of machine learning and big data science.We believe that our results are also instrumental for the personalised recommendation systems on internet platforms, like Netflix, YouTube, and Amazon, etc. SDTG can be further accelerated by adapting the Lazy Greedy strategy [38].A future research direction could also be accelerating the proposed algorithm by combining distributed computing.

Fig. 1
Fig. 1 Performance comparison of different algorithms 1.It is clear from Fig.1athat, on average, Sample and SDTG related algorithms outperform Greedy in terms of solution quality.The quality of solutions provided by SDTG is better than that of Sample, although SDTG has a slightly worse theoretical approximation guarantee than Sample does.Overall, Max SDTG (4) achieves the highest function value.Figure1bshows the number of function evaluations consumed by different algorithms.Four rounds of Sample requires the largest number of function evaluations when m g ≥ 2 .Relatively, Greedy requires a bit fewer function evaluations than Max Sample (4) does.But four rounds of SDTG requires significantly fewer evaluations.Overall, Greedy and Sample-related algorithms consume increasing numbers of function evaluations as m g goes up.However, the numbers of function evaluations of the SDTG-related algorithms almost stay constant when m g ≥ 2 .When m g = 6 , four rounds of SDTG is even faster than one round of Sample.