Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model

1Xi'an Jiaotong University, 2SenseTime Research
CVPR 2025

#Completed during internship at SenseTime Research*Corresponding Author
Teaser Image

We propose the virtual try-on task for ornaments including bracelets, rings, earrings, and necklaces for the first time. Our method achieves realistic virtual try-on results and high-fidelity identity preservation of ornament using pose-aware mask prediction and mask-guided attention.

Abstract

While virtual try-on for clothes and shoes with diffusion models has gained attraction, virtual try-on for ornaments, such as bracelets, rings, earrings, and necklaces, remains largely unexplored. Due to the intricate tiny patterns and repeated geometric sub-structures in most ornaments, it is much more difficult to guarantee identity and appearance consistency under large pose and scale variances between ornaments and models. This paper proposes the task of virtual try-on for ornaments and presents a method to improve the geometric and appearance preservation of ornament virtual try-ons. Specifically, we estimate an accurate wearing mask to improve the alignments between ornaments and models in an iterative scheme alongside the denoising process. To preserve structure details, we further regularize attention layers to map the reference ornament mask to the wearing mask in an implicit way. Experiment results demonstrate that our method successfully wears ornaments from reference images onto target models, handling substantial differences in scale and pose while preserving identity and achieving realistic visual effects.

Teaser Image

a. In training, given reference ornament and model images and masks, our method concatenates ornament and masked model images as input to the ReferenceNet branch, which extracts features to predict wearing mask in an iterative way. The extracted features are also injected into the denoising U-Net to improve details generation. b. We enforce the attention layers to preserve structure details by formulating the layers to map the reference ornament mask to the ground truth wearing mask in an implicit way rather than directly imposing the mask onto attention maps.

Teaser Image

Virtual try-on results on other categories including bracelets, rings, necklaces, and earrings. Please zoom in to see the details.

BibTeX

@article{yingmao2025shining,
        title={Shining Yourself: High-Fidelity Ornaments Virtual Try-on with Diffusion Model},
        author={Yingmao Miao, Zhanpeng Huang, Rui Han, Zibin Wang, Chenhao Lin, Chao Shen},
        journal={IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        year={2025}
      }