We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garment sewing patterns from images or text descriptions.
Unlike previous methods that often lack robustness and interactive editing capabilities, ChatGarment finetunes a VLM to produce GarmentCode, a JSON-based, language-friendly format for 2D sewing patterns, enabling both estimating and editing from images and text instructions. To optimize performance, we refine GarmentCode by expanding its support for more diverse garment types and simplifying its structure, making it more efficient for VLM finetuning. Additionally, we develop an automated data construction pipeline to generate a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs, empowering ChatGarment with strong generalization across various garment types.
Extensive evaluations demonstrate ChatGarment's ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to revolutionize workflows in fashion and gaming applications.
We utilize the large vision-language model for sewing pattern understanding. ChatGarment features three dialog modes. Users can utilize images as visual inputs for garment creation or guidance, while text instructions define specific tasks. These inputs are transformed into GarmentCode, which is then converted to 3D garments.
Results of image-based reconstruction. Unlike SewFormer and DressCode, which often mess up or miss the garments, ChatGarment could faithfully capture the shape, style and the composition of the target garments.
Results of garment generation from texts. ChatGarment follows the instruction more accurately, generating more precise details (types, sleeves, length, etc.) compared to DressCode.
Results of garment editing. The models need to edit the source garment according to the given editing instructions in the prompt boxes. ChatGarment accurately modifies the targeted garment part according to the input prompt. In contrast, methods based on SewFormer or DressCode often fail to precisely follow the prompt instructions and alter other untouched areas of the garment.