Enhancing Small Object Detection through Hierarchical Knowledge Distillation

Dr. Helena V. Petrovic; Marco Fiorelli

Authors

Dr. Helena V. Petrovic Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia
Marco Fiorelli Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia

Keywords:

Small object detection, knowledge distillation, hierarchical learning, feature refinement

Abstract

Detecting small objects accurately remains a significant challenge in computer vision due to limited visual cues and scale variance. This paper proposes a novel hierarchical knowledge distillation framework that enhances small object detection by effectively transferring multi-scale semantic and spatial knowledge from a high-capacity teacher network to a compact student model. Our approach incorporates layer-wise distillation, attention-based feature refinement, and adaptive supervision to preserve fine-grained features crucial for small object identification. Experiments on benchmark datasets such as COCO and Pascal VOC demonstrate notable improvements in detection accuracy and efficiency, especially for small-sized objects, highlighting the effectiveness of our method in practical real-time applications.

References

Cao C, Wang B, Zhang W, Zeng X, Yan X, Feng Z, Liu Y, Wu Z. An improved faster R-CNN for small object detection. IEEE Access, 2019, 7: 106838–106846. DOI: https://doi.org/10.1109/ACCESS.2019.2932731.

Yang C, Huang Z, Wang N. QueryDet: Cascaded sparse query for accelerating high-resolution small object detection. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.13658–13667. DOI: https://doi.org/10.1109/CVPR52688.2022.01330.

Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th International Conference on Neural Information Processing Systems, Dec. 2016, pp.4114–4122.

Rastegari M, Ordonez V, Redmon J, Farhadi A. XNOR-Net: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.525–542. DOI: https://doi.org/10.1007/978-3-319-46493-0_32.

Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.1135–1143.

He Y, Zhang X, Sun J. Channel pruning for accelerating very deep neural networks. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.1398–1406. DOI: https://doi.org/10.1109/ICCV.2017.155.

Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv: 1503.02531, 2015. https://arxiv.org/abs/1503.02531, Jul. 2024.

Ji M, Heo B, Park S. Show, attend and distill: Knowledge distillation via attention-based feature matching. In Proc. the 35th AAAI Conference on Artificial Intelligence, Feb. 2021, pp.7945–7952. DOI: https://doi.org/10.1609/aaai.v35i9.16969.

Wang T, Yuan L, Zhang X, Feng J. Distilling object detectors with fine-grained feature imitation. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2019, pp.4928–4937. DOI: https://doi.org/10.1109/CVPR.2019.00507.

Zhang L, Ma K. Improve object detection with feature-based knowledge distillation: Towards accurate and efficient detectors. In Proc. the 9th International Conference on Learning Representations, May 2021.

Heo B, Kim J, Yun S, Park H, Kwak N, Choi J Y. A comprehensive overhaul of feature distillation. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.1921–1930. DOI: https://doi.org/10.1109/ICCV.2019.00201.

Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 28th International Conference on Neural Information Processing Systems, Dec. 2015, pp.91–99.

Lin T Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In Proc. the 2017 IEEE conference on Computer Vision and Pattern Recognition, Jul. 2017, pp.936–944. DOI: https://doi.org/10.1109/CVPR.2017.106.

Kang Z, Zhang P, Zhang X, Sun J, Zheng N. Instance-conditional knowledge distillation for object detection. In Proc. the 35th International Conference on Neural Information Processing Systems, Dec. 2021, Article No. 1259.

Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.618–626. DOI: https://doi.org/10.1109/ICCV.2017.74.

Cao Y, Xu J, Lin S, Wei F, Hu H. GCNet: Non-local networks meet squeeze-excitation networks and beyond. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision workshop, Oct. 2019, pp.1971–1980. DOI: https://doi.org/10.1109/ICCVW.2019.00246.

Yang Z, Li Z, Jiang X, Gong Y, Yuan Z, Zhao D, Yuan C. Focal and global knowledge distillation for detectors. In Proc. the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2022, pp.4633–4642. DOI: https://doi.org/10.1109/CVPR52688.2022.00460.

Chen G, Choi W, Yu X, Han T, Chandraker M. Learning efficient object detection models with knowledge distillation. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.742–751.

Guo J, Han K, Wang Y, Wu H, Chen X, Xu C, Xu C. Distilling object detectors via decoupled features. In Proc. the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2021, pp.2154–2164. DOI: https://doi.org/10.1109/CVPR46437.2021.00219.

Chang J, Wang S, Xu H M, Chen Z, Yang C, Zhao F. DETRDistill: A universal knowledge distillation framework for DETR-families. In Proc. the 2023 IEEE/CVF International Conference on Computer Vision, Oct. 2023, pp.6875–6885. DOI: https://doi.org/10.1109/ICCV51070.2023.00635.

Lin T Y, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2020, 42(2): 318–327. DOI: https://doi.org/10.1109/TPAMI.2018.2858826.

Tian Z, Shen C, Chen H, He T. FCOS: Fully convolutional one-stage object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.9627–9636. DOI: https://doi.org/10.1109/ICCV.2019.00972.

Ge Z, Liu S, Wang F, Li Z, Sun J. YOLOX: Exceeding YOLO series in 2021. arXiv: 2107.08430, 2021. https://arxiv.org/abs/2107.08430, Jul. 2024.

Huang H, Zhou X, Cao J, He R, Tan T. Vision transformer with super token sampling. arXiv: 2211.11167, 2024. https://arxiv.org/abs/2211.11167, Jul. 2024.

Zhu L, Wang X, Ke Z, Zhang W, Lau R. BiFormer: Vision transformer with bi-level routing attention. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.10323–10333. DOI: https://doi.org/10.1109/CVPR52729.2023.00995.

Tian R, Wu Z, Dai Q, Hu H, Qiao Y, Jiang Y G. Res-Former: Scaling ViTs with multi-resolution training. In Proc. the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2023, pp.22721–22731. DOI: https://doi.org/10.1109/CVPR52729.2023.02176.

Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K. Augmentation for small object detection. arXiv: 1902. 07296, 2019. https://arxiv.org/abs/1902.07296, Jul. 2024.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot MultiBox detector. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.21–37. DOI: https://doi.org/10.1007/978-3-319-46448-0_2.

Cai Z, Fan Q, Feris R S, Vasconcelos N. A unified multi-scale deep convolutional neural network for fast object detection. In Proc. the 14th European Conference on Computer Vision, Oct. 2016, pp.354–370. DOI: https://doi.org/10.1007/978-3-319-46493-0_22.

Kong T, Yao A, Chen Y, Sun F. HyperNet: Towards accurate region proposal generation and joint object detection. In Proc. the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2016, pp.845–853. DOI: https://doi.org/10.1109/CVPR.2016.98.

Li Y, Chen Y, Wang N, Zhang Z X. Scale-aware trident networks for object detection. In Proc. the 2019 IEEE/CVF International Conference on Computer Vision, Oct. 27-Nov. 2, 2019, pp.6054–6063. DOI: https://doi.org/10.1109/ICCV.2019.00615.

Singh B, Davis L S. An analysis of scale invariance in object detection—SNIP. In Proc. the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.3578–3587. DOI: https://doi.org/10.1109/CVPR.2018.00377.

Singh B, Najibi M, Davis L S. SNIPER: Efficient multi-scale training. In Proc. the 32nd International Conference on Neural Information Processing Systems, Dec. 2018, pp.9333–9343.

Chen Y, Zhang P, Li Z, Li Y, Zhang X, Qi L, Sun J, Jia J. Dynamic scale training for object detection. arXiv: 2004.12432, 2021. https://arxiv.org/abs/2004.12432, Jul. 2024.

Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S. End-to-end object detection with transformers. In Proc. the 16th European Conference on Computer Vision, Aug. 2020, pp.213–229. DOI: https://doi.org/10.1007/978-3-030-58452-8_13.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L, Polosukhin I. Attention is all you need. In Proc. the 31st International Conference on Neural Information Processing Systems, Dec. 2017, pp.6000–6010.

Romero A, Ballas N, Kahou S E, Chassang A, Gatta C, Bengio Y. FitNets: Hints for thin deep nets. In Proc. the 3rd International Conference on Learning Representations, May 2015. DOI: https://doi.org/10.48550/arXiv.1412.6550.

Loshchilov I, Hutter F. Decoupled weight decay regular-ization. In Proc. the 7th International Conference on Learning Representations, May 2017.

Liu H, Liu Q, Liu Y, Liang Y, Zhao G. Exploring effective knowledge distillation for tiny object detection. In Proc. the 2023 IEEE International Conference on Image Processing, Oct. 2023, pp.770–774. DOI: https://doi.org/10.1109/ICIP49359.2023.10222589.

Ni Z L, Yang F, Wen S, Zhang G. Dual relation knowledge distillation for object detection. In Proc. the 32nd International Joint Conference on Artificial Intelligence, Aug. 2023, pp.1276–1284. DOI: https://doi.org/10.24963/ijcai.2023/142.

He K, Gkioxari G, Dollar P, Girshick R. Mask R-CNN. In Proc. the 2017 IEEE International Conference on Computer Vision, Oct. 2017, pp.2980–2988. DOI: https://doi.org/10.1109/ICCV.2017.322.

Lee Y, Hwang J W, Lee S, Bae Y, Park J. An energy and GPU-computation efficient backbone network for realtime object detection. In Proc. the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2019, pp.752–760. DOI: https://doi.org/10.1109/CVPRW.2019.00103.

Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L C. MobileNetV2: Inverted residuals and linear bottlenecks. In Proc. the 2018 IEEE/CVF conference on Computer Vision and Pattern Recognition, Jun. 2018, pp.4510–4520. DOI: https://doi.org/10.1109/CVPR.2018.00474.

Enhancing Small Object Detection through Hierarchical Knowledge Distillation

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Make a Submission

Information

Journal Links

Make a Submission