August 05, 2022
Poster, Brown University Summer Research Symposium, Providence, RI
Image translation, that is, to alter the style and content of a given image to match predefined objectives, is a novel technique for artists to achieve their artistic vision. Recent works in image-to-image translation introduce methods to generate photorealistic imagery from non-realistic domains (e.g. drawings, paintings, etc.). However, these models do not allow the user to select specific region(s) for transformation and control the style of generation through text while simulating a photorealistic style. In this project, we present a language-based image-to-image translation model that allows the user to perform object-level edits via semantic query texts. This model takes a sketched image, an instance segmentation mask of the various objects in the sketch, and their corresponding text descriptors as input to translate a sketched image into the photo-realistic domain through texture generation. We adapt existing image-to-image translation architecture along with a pre-trained text-image embedding model to encode text embeddings within an instance segmentation mask for controlled regional material appearance editing. Our method allows users to edit the object appearance, generating diverse outputs given the same input image. Our work automates architectural and product visualization by allowing users to control the modes in which the sketches and designs are presented in the photorealistic domain.