Toward Accurate, Realistic Virtual Try-on Through Shape Matching: Conclusions & References

8 Jun 2024


(1) Kedan Li, University of Illinois at Urbana-Champaign;

(2) Min Jin Chong, University of Illinois at Urbana-Champaign;

(3) Jingen Liu, JD AI Research;

(4) David Forsyth, University of Illinois at Urbana-Champaign.

5. Conclusions

In this paper, we propose two general modifications to the virtual try-on framework: (a) carefully choose the product-model pair for transfer using a shape embedding and (b) combine multiple coordinated warps using inpainting. Our results show that both modifications lead to significant improvement in generation quality. Qualitative examples demonstrate our ability to accurately preserve details of garments. This lead to difficulties for shoppers to distinguish between real and synthesized model images, shown by user study results.


  1. Alp Guler, R., Neverova, N., Kokkinos, I.: Densepose: Dense human pose estimation in the wild. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)

  2. Ayush, K., Jandial, S., Chopra, A., Krishnamurthy, B.: Powering virtual try-on via auxiliary human segmentation learning. In: The IEEE International Conference on Computer Vision (ICCV) Workshops (Oct 2019)

  3. Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. PAMI (2002)

  4. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In: ECCV (2016)

  5. Brock, A., Donahue, J., Simonyan, K.: Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096 (2018)

  6. Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: ECCV (2018)

  7. Chen, M., Qin, Y., Qi, L., Sun, Y.: Improving fashion landmark detection by dual attention feature enhancement. In: ICCV Workshops (2019)

  8. Chen, W., Wang, H., Li, Y., Su, H., Wang, Z., Tu, C., Lischinski, D., Cohen-Or, D., Chen, B.: Synthesizing training images for boosting human 3d pose estimation (2015)

  9. Chong, M.J., Forsyth, D.: Effectively unbiased fid and inception score and where to find them. arXiv preprint arXiv:1911.07023 (2019)

  10. Danerek, R., Dibra, E., Oztireli, A.C., Ziegler, R., Gross, M.H.: Deepgarment : 3d garment shape estimation from a single image. Comput. Graph. Forum (2017)

  11. Dong, H., Liang, X., Gong, K., Lai, H., Zhu, J., Yin, J.: Soft-gated warping-gan for pose-guided person image synthesis. In: NeurIPS (2018)

  12. Dong, H., Liang, X., Wang, B., Lai, H., Zhu, J., Yin, J.: Towards multi-pose guided virtual try-on network. In: ICCV (2019)

  13. Grigor’ev, A.K., Sevastopolsky, A., Vakhitov, A., Lempitsky, V.S.: Coordinatebased texture inpainting for pose-guided human image generation. CVPR (2019)

  14. Guan, P., Reiss, L., Hirshberg, D., Weiss, A., Black, M.: Drape: Dressing any person. ACM Transactions on Graphics - TOG (2012)

  15. Han, X., Hu, X., Huang, W., Scott, M.R.: Clothflow: A flow-based model for clothed person generation. In: ICCV (2019)

  16. Han, X., Wu, Z., Huang, W., Scott, M.R., Davis, L.S.: Compatible and diverse fashion image inpainting (2019)

  17. Han, X., Wu, Z., Wu, Z., Yu, R., Davis, L.S.: Viton: An image-based virtual try-on network. In: CVPR (2018)

  18. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in neural information processing systems. pp. 6626–6637 (2017)

  19. Hsiao, W.L., Grauman, K.: Dressing for diverse body shapes. ArXiv (2019)

  20. Hsiao, W.L., Katsman, I., Wu, C.Y., Parikh, D., Grauman, K.: Fashion++: Minimal edits for outfit improvement. In: In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2019)

  21. Hsieh, C.W., Chen, C.Y., Chou, C.L., Shuai, H.H., Liu, J., Cheng, W.H.: Fashionon: Semantic-guided image-based virtual try-on with detailed human and clothing information. In: MM ’19 (2019)

  22. HyugJae, Lee, R., Kang, M., Cho, M., Park, G.: La-viton: A network for lookingattractive virtual try on. In: ICCV Workshops (2019)

  23. Jaderberg, M., Simonyan, K., Zisserman, A., kavukcuoglu, k.: Spatial transformer networks. In: NeurIPS (2015)

  24. Jandial, S., Chopra, A., Ayush, K., Hemani, M., Kumar, A., Krishnamurthy, B.: Sievenet: A unified framework for robust image-based virtual try-on. In: WACV (2020)

  25. Jeong, M.H., Han, D.H., Ko, H.S.: Garment capture from a photograph. Journal of Visualization and Computer Animation (2015)

  26. Ji, D., Kwon, J., McFarland, M., Savarese, S.: Deep view morphing. In: CVPR (2017)

  27. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. CVPR (2018)

  28. Kanazawa, A., Jacobs, D., Chandraker, M.: Warpnet: Weakly supervised matching for single-view reconstruction. In: CVPR (2016)

  29. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4401–4410 (2019)

  30. Lin, C.H., Yumer, E., Wang, O., Shechtman, E., Lucey, S.: St-gan: Spatial transformer generative adversarial networks for image compositing. In: CVPR (2018)

  31. Liu, G., Reda, F.A., Shih, K.J., Wang, T.C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: ECCV (2018)

  32. Liu, K.H., Chen, T.Y., Chen, C.S.: Mvc: A dataset for view-invariant clothing retrieval and attribute prediction. In: ICMR (2016)

  33. Liu, Z., Luo, P., Qiu, S., Wang, X., Tang, X.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: CVPR (2016)

  34. McKinsey: State of the fashion industry 2019 (2019)

  35. Natsume, R., Saito, S., Huang, Z., Chen, W., Ma, C., Li, H., Morishima, S.: Siclope: Silhouette-based clothed people supplementary materials. In: CVPR (2019)

  36. Neverova, N., Gler, R.A., Kokkinos, I.: Dense pose transfer. In: ECCV (2018)

  37. Raffiee, A.H., Sollami, M.: Garmentgan: Photo-realistic adversarial fashion transfer (2020)

  38. Raj, A., Sangkloy, P., Chang, H., Hays, J., Ceylan, D., Lu, J.: Swapnet: Image based garment transfer. In: ECCV (2018)

  39. Rocco, I., Arandjelovi´c, R., Sivic, J.: Convolutional neural network architecture for geometric matching. In: CVPR (2017)

  40. Saito, S., Huang, Z., Natsume, R., Morishima, S., Kanazawa, A., Li, H.: Pifu: Pixelaligned implicit function for high-resolution clothed human digitization. ICCV (2019)

  41. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: CVPR (2015)

  42. Song, D., Li, T., Mao, Z., Liu, A.: Sp-viton: shape-preserving image-based virtual try-on network. Multimedia Tools and Applications (2019)

  43. Suzuki, S., Abe, K.: Topological structural analysis of digitized binary images by border following. Computer Vision, Graphics, and Image Processing (1985)

  44. Vaccaro, K., Agarwalla, T., Shivakumar, S., Kumar, R.: Designing the future of personal fashion. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (2018)

  45. Wang, B., Zheng, H., Liang, X., Chen, Y., Lin, L.: Toward characteristic-preserving image-based virtual try-on network. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)

  46. Wang, J., Zhang, W., Liu, W.H., Mei, T.: Down to the last detail: Virtual try-on with detail carving. ArXiv (2019)

  47. Wu, Z., Lin, G., Tao, Q., Cai, J.: M2e-try on net: Fashion from model to everyone. In: MM ’19 (2018)

  48. Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using multi-scale neural patch synthesis. In: CVPR (2017)

  49. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: ICCV (2019)

  50. Yu, J., Lin, Z.L., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextualattention. In: CVPR (2018)

  51. Yu, L., Zhong, Y., Wang, X.: Inpainting-based virtual try-on network for selectivegarment transfer. IEEE Access (2019)

  52. Yu, L., Zhong, Y., Wang, X.: Inpainting-based virtual try-on network for selectivegarment transfer. IEEE Access (2019)

  53. Yu, R., Wang, X., Xie, X.: Vtnfp: An image-based virtual try-on network withbody and clothing feature preservation

  54. Zhang, H., Goodfellow, I., Metaxas, D., Odena, A.: Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318 (2018)

  55. Zheng, N., Song, X., Chen, Z., Hu, L., Cao, D., Nie, L.: Virtually trying on newclothing with arbitrary poses. In: MM ’19 (2019)

  56. Zheng, S., Yang, F., Kiapour, M.H., Piramuthu, R.: Modanet: A large-scale streetfashion dataset with polygon annotations. In: ACM Multimedia (2018)

  57. Zhu, S., Fidler, S., Urtasun, R., Lin, D., Chen, C.L.: Be your own prada: Fashionsynthesis with structural coherence. In: CVPR (2017)

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.