Skip to main content

Table 13 Automatic video summarization

From: An analytical study of information extraction from unstructured and multidimensional big data

 

Approach

Technique

Purpose

Dataset

Results/limitations

[97]

Supervised with prior segmentation

SVM based kernel video segmentation

Category specific video summarization

MED summaries training set 12,249 videos and testing set 60 videos

Higher quality video summaries can be produced with known categories than unsupervised approach

[95]

Unsupervised with web based prior information

Used four baseline algorithms: random and uniform sampling, k-means and spectral clustering followed by crowdsourcing

To deal with content sparsity and large scale evaluation

180 videos 25 for training and 155 for evaluation

Content sparsity and poor quality of user generated videos are major challenges

Expert evaluation is not possible for large scale data, therefore crowdsourcing used

Adding web images of category to incorporate knowledge is time consuming process especially for unknown categories

[98]

Supervised

Linear combination of Submodular maximization for each objective using structured learning

To implement interestingness, representativeness, and uniformity

Egocentric dataset and SumMe dataset

Shortage of large datasets for summarization

[94]

Supervised

vsLSTM to model variable range temporal dependency

To address the need for large amount of annotated data

SumMe and TVSum

Domain adaptation can improve learning and reduces discrepancies

[93]

Supervised

Sequential determinantal point process: supervised DPP coupled with NN representation

To incorporate human created summaries for selection of informative and diverse datasets

Open video project (50), YouTube (39), Kodak consumer video (18 videos)

Supervised approach with linear representation performed better

[99]

NA

MSR (Minimum Sparse Representation) based summarization

To utilize min number of keyframes

To provide flexibility for practical applications

Open video project (50 videos), several genres dataset (50 videos)

Two variants were proposed for off-line and on-line applications

Focused on selection of key frames

[96]

Unsupervised

General adversarial framework: summarizer (auto encoder LSTM) + discriminator LSTM (LSTM)

To regularize the summary length, diversity, and keyframes for

SumMe, TVSum, open video project, YouTube

Different performance on different datasets

Deep features perform better than shallow features

Frames with very slow motion and no scene change gave poor results