Computer vision pipeline implementation: Acne lesion segmentation with SwinUnet and multi-class classification with swin transformer
Abstract
Acne is a common dermatological problem and often requires accurate lesion identification to support the diagnosis and appropriate treatment. Advances in deep learning and computer vision technologies offer opportunities to develop automated systems capable of detecting and classifying acne lesions more objectively and efficiently. This study aims to develop a two-stage computer vision pipeline for automatic acne lesion detection and classification by integrating the Swin-UNet model for semantic segmentation and the Swin Transformer for multi-class classification. The approach used is a Transformer-based cascaded pipeline architecture, where the segmentation results are used as a guide for Regions of Interest (ROIs) in the classification stage so that the classification process focuses on relevant lesion areas. To address class imbalance and improve model generalization, a combination of Weighted Random Sampling, Mixup data augmentation, and Sharpness-Aware Minimization (SAM) algorithms are applied. The evaluation process is carried out using a dataset strictly separated into training, validation, and testing data. Experimental results showed that the segmentation model achieved a Dice coefficient of 0.9885 and an Intersection over Union (IoU) of 0.9788. Meanwhile, the classification model achieved an accuracy of 96.24% with an F1-score of 0.9629. These findings demonstrate that the proposed system is effective in identifying and classifying acne lesions with precision. Therefore, this approach has the potential to serve as the basis for developing a more accurate and reliable deep learning-based dermatology diagnostic support system.
Copyright (c) 2026 Syarif Romadloni, Asri Kurnia Ramadhani, Fafian Ihsan Saputra, Humam Nasywa Fawazi, Muhammad 'Ainun Naja, Budi Sunarko

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







