5_long_consistency

Fig.5. Video of a long style consistent translation video visual example (1680 frames).

Left: input semantic labels; Right: UVIT translated video in sunset scenario. All frames within the video share the same style code to keep style consistency.

UVIT
UVIT
Unsupervised Multimodal Video-to-Video Translation via Self-Supervised Learning