Impact Statement:This article proposes a new approach to image matting, termed text-guided image matting. Departing from conventional guidance-based methods, text-guided matting relies so...Show More
Abstract:
Image matting is a technique used to separate the foreground of an image from the background, which estimates an alpha matte that indicates pixel-wise degree of transpare...Show MoreMetadata
Impact Statement:
This article proposes a new approach to image matting, termed text-guided image matting. Departing from conventional guidance-based methods, text-guided matting relies solely on concise textual descriptions of the foreground object for guidance. It provides semantic insights and facilitates efficient batch processing of multiple frames with identical objects. The deep neural network (NN) developed for this purpose shows competitive performance in portrait matting, outperforming traditional trimap-based or background-based methods. This work marks a significant step toward more intelligent image matting solutions, enhancing user-friendliness through the integration of semantically driven artificial intelligence.
Abstract:
Image matting is a technique used to separate the foreground of an image from the background, which estimates an alpha matte that indicates pixel-wise degree of transparency. To precisely extract target objects and address the ambiguity of solutions in image matting, many existing approaches employ a trimap or background image provided by the user as additional input to guide the matting process. This article introduces a novel matting paradigm termed text-guided image matting, utilizing a textual description of the foreground object as a guiding element. In contrast to trimap or background-based methods, text-guided matting offers a user-friendly interface, providing semantic clues for the objects of interest. Moreover, it facilitates batch processing across multiple frames featuring the same objects of interest. The proposed text-guided matting approach is implemented through a deep neural network (NN) comprising three-stage cross-modal feature fusion and two-step alpha matte prediction. Experimental results on portrait matting demonstrate the competitive performance of our text-guided approach compared to existing trimap-based and background-based methods.
Published in: IEEE Transactions on Artificial Intelligence ( Volume: 5, Issue: 8, August 2024)