Abstract. We present FusedGAN, a deep network for conditional image synthesis with controllable sampling of diverse images. Fidelity, diversity and controllable sampling are the main quality measures of a good image generation model. Most existing models are insuffiffifficient in all three aspects. The FusedGAN can perform controllable sampling of diverse images with very high fifidelity. We argue that controllability can be achieved by disentangling the generation process into various stages. In contrast to stacked GANs, where multiple stages of GANs are trained separately with full supervision of labeled intermediate images, the FusedGAN has a single stage pipeline with a built-in stacking of GANs. Unlike existing methods, which require full supervision with paired conditions and images, the FusedGAN can effffectively leverage more abundant images without corresponding conditions in training, to produce more diverse samples with high fifidelity. We achieve this by fusing two generators: one for unconditional image generation, and the other for conditional image generation, where the two partly share a common latent space thereby disentangling the generation. We demonstrate the effiffifficacy of the FusedGAN in fifine grained image generation tasks such as text-to-image, and attribute-to-face generation