The challenges posed by the COVID-19 pandemic underscored the critical importance of proper mask usage, highlighting the need for automated systems to monitor face mask-wearing conditions. In this paper, we introduce PFMNet, a novel architecture for recognizing the wearing status of face masks. PFMNet is inspired by the InternImage architecture and employs Deformable Convolution Networks (DCNs) to capture long-range dependencies crucial for accurate mask status determination. The significant challenge of class imbalance, particularly the scarcity of improperly worn mask samples, is addressed by integrating the Category Attention Block (CAB). CAB improves distinct regions, diversifies feature representations, and utilizes efficient global pooling to identify crucial areas, such as the human face, while reducing the computational cost. The performance of PFMNet was assessed using the publicly available PWMFD dataset, which had to be refined due to duplicate images and incorrect annotations. PFMNet was compared to three other state-of-the-art models: InternImage, ConvNext, and EfficientNet. It outperformed these models, achieving an accuracy of 99.39%. This places it ahead of the second-best model by a margin of 0.45%. The confusion matrices illustrate that PFMNet outperforms other models in all classes, particularly excelling in the “with mask” and “without mask” categories, resulting in the best overall performance.