Skip to main content

Table 1 Sample input and output of proposed framework

From: Semantic context driven language descriptions of videos using deep neural network

Figure No.

Ground-truth

Model_1

Model_2

Model_3

Figure 5a

{A girl is riding a horse and jumping barriers}, { A girl is riding a horse}, {A woman is riding a horse in an outdoor arena and she makes a jump}, { A woman is riding a horse}, {A person riding a horse is jumping hurdles}.

{a man is riding a horse}.

{a man is riding a horse }.

{a girl is riding a horse}.

Figure 5b

{A baby is playing}, {A baby is playing with a pacifier}, {A baby plays on a bed}, {a baby is sucking on a soother and watching the camera}, {The toddler put the pacifier in and out of his mouth}.

{a man is slicing a potato}.

{a baby is playing with toys}.

{3a baby is playing with a camera}.

Figure 5c

{A cat is playing with a bunny}. ; {A cat is playing with a rabbit}, {A kitten is playing with a rabbit},{a white cat playing with a white bunny}, {A kitten and a rabbit are playing}.

{ a kitten is trying to climb a tree}.

{a cat is playing}.

{ a white kitten is playing}.

Figure 5d

{A woman is cutting something},{A woman cuts up some worms},{someone show how to prepare the japanese food},{a person coking},{ a women is making dish}.

{A woman is preparing a dish}

{The woman is mixing ingredients in a bowl}

{A woman is mixing some eggs}.

Figure 5e

{A person is driving a car},{ the man is drive the car on the road and seeing the place}, {Someone is driving a car}, {A car is driving down a road}, {A car is moving}.

{a man is running in the water}

{a man is driving a car}.

{a man is driving a car }.