Skip to main content

Table 1 A summary of the reviewed keyframe extraction methods

From: Bilingual video captioning model for enhanced video retrieval

Ref.

Year

Approach/method

Weaknesses

[13]

2022

Content-based: coarse extraction and partial-fine re-extraction of spatiotemporal slices

Not suitable for all videos (especially videos that contain fast scenes)

[15]

2021

Content-based: SSIM

Applied to a specific regions, not a complete frame

[17]

2015

Time-based

Inaccurate because it is based on time (one frame each second)

[18]

2015

Frame-based

Inaccurate because it is based on the number of frames (240 frames per video)

[19]

2021

Content-based: filtration network RL-based

Requires efficient training

[20]

2022

Content-based: multiview fusion method-based

Complicated method

Requires specific frame sizes

[22]

2022

Content-based: local consistent deformable convolution

Long processing time

[23]

2020

Content-based: Sobel gradient images and variance coefficient measure

Long processing time compared to other similar approaches [26]