Talking Face Generation by Conditional Recurrent Adversarial Network

资源分类

2019-10-08 |

46 |

34 |

Abstract Given an arbitrary face image and an arbitrary speech clip, the proposed work attempts to generate the talking face video with accurate lip synchronization. Existing works either do not consider temporal dependency across video frames thus yielding abrupt facial and lip movement or are limited to the generation of talking face video for a specifific person thus lacking generalization capacity. We propose a novel conditional recurrent generation network that incorporates both image and audio features in the recurrent unit for temporal dependency. To achieve both image- and videorealism, a pair of spatial-temporal discriminators are included in the network for better image/video quality. Since accurate lip synchronization is essential to the success of talking face video generation, we also construct a lip-reading discriminator to boost the accuracy of lip synchronization. We also extend the network to model the natural pose and expression of talking face on the Obama Dataset. Extensive experimental results demonstrate the superiority of our framework over the state-of-the-art in terms of visual quality, lip sync accuracy, and smooth transition pertaining to both lip and facial movement

上一篇：Supervised Set-to-Set Hashing in Visual Recognition

下一篇：Transferable Adversarial Attacks for Image and Video Object Detection

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com