Enhanced Visual Localization using Binocular Vision: A Framework for Optimized Keypoint Distribution and Robust Multi-View Geometric Constraints
Keywords:
Visual localization, binocular vision, keypoint distribution, multi-view geometryAbstract
Visual localization, a cornerstone of numerous autonomous systems, including robotics, augmented reality, and self-driving vehicles, demands high precision and robustness. This paper presents an advanced binocular camera-based visual localization framework that significantly enhances performance through an optimized keypoint selection strategy and the judicious application of multi-epipolar constraints. By carefully distributing keypoints to cover the scene comprehensively and leveraging the geometric relationships across multiple stereo image pairs, the proposed method achieves superior accuracy and resilience to noise and outliers. The system employs state-of-the-art feature detection and matching, followed by a robust pose estimation pipeline incorporating advanced RANSAC variants and multi-view consistency checks. Experimental validation demonstrates that our approach outperforms existing methods in challenging indoor and outdoor environments, offering a reliable and computationally efficient solution for real-world localization applications.
References
Bai, N., Tian, Y., Liu, Y., Yuan, Z., Xiao, Z., Zhou, J., 2020. A high-precision and low-cost IMU-based indoor pedestrian positioning technique. IEEE Sens. J. 20 (12), 6716–6726. http://dx.doi.org/10.1109/JSEN.2020.2976102.
Bailo, O., Rameau, F., Joo, K., Park, J., Bogdan, O., Kweon, I.S., 2018. Efficient adaptive non-maximal suppression algorithms for homogeneous spatial keypoint distribution. Pattern Recognit. Lett. 106, 53–60. http://dx.doi.org/10.1016/j.patrec.2018.02.020.
Bao, W., Wang, W., Xu, Y., Guo, Y., Hong, S., Zhang, X., 2020. Instereo2k: a large real dataset for stereo matching in indoor scenes. Sci. China Inf. Sci. 63, 1–11. http://dx.doi.org/10.1007/s11432-019-2803-x.
Barath, D., Matas, J., 2022. Graph-cut RANSAC: Local optimization on spatially coherent structures. IEEE Trans. Pattern Anal. Mach. Intell. 44 (9), 4961–4974. http://dx.doi.org/10.1109/TPAMI.2021.3071812.
Berton, G., Masone, C., Caputo, B., 2022. Rethinking visual geo-localization for large-scale applications. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 4868–4878. http://dx.doi.org/10.1109/CVPR52688.2022.00483.
Berton, G., Trivigno, G., Caputo, B., Masone, C., 2023. EigenPlaces: Training viewpoint robust models for visual place recognition. In: 2023 IEEE/CVF International Conference on Computer Vision. ICCV, pp. 11046–11056. http://dx.doi.org/10.1109/ICCV51070.2023.01017.
Brachmann, E., Cavallari, T., Prisacariu, V.A., 2023. Accelerated coordinate encoding: Learning to relocalize in minutes using RGB and poses. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 5044–5053. http://dx.doi.org/10.1109/CVPR52729.2023.00488.
Brachmann, E., Krull, A., Nowozin, S., Shotton, J., Michel, F., Gumhold, S., Rother, C., 2017. DSAC — Differentiable RANSAC for camera localization. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition. CVPR, pp. 2492–2500. http://dx.doi.org/10.1109/CVPR.2017.267.
Brachmann, E., Rother, C., 2022. Visual camera re-localization from RGB and RGB-D images using DSAC. IEEE Trans. Pattern Anal. Mach. Intell. 44 (9), 5847–5865. http://dx.doi.org/10.1109/TPAMI.2021.3070754.
Cao, B., Araujo, A., Sim, J., 2020. Unifying deep local and global features for image search. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, pp. 726–743. http://dx.doi.org/10.1007/978-3-030-58565-5_43.
Cao, M., Zheng, L., Jia, W., Lu, H., Liu, X., 2021. Accurate 3-D reconstruction under IoT environments and its applications to augmented reality. IEEE Trans. Ind. Inform. 17 (3), 2090–2100. http://dx.doi.org/10.1109/TII.2020.3016393.
Chen, S., Cavallari, T., Prisacariu, V.A., Brachmann, E., 2024. Map-relative pose regression for visual re-localization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 20665–20674.
Chen, C., Wang, B., Lu, C.X., Trigoni, N., Markham, A., 2023. Deep learning for visual localization and mapping: A survey. IEEE Trans. Neural Netw. Learn. Syst. 1–21. http://dx.doi.org/10.1109/TNNLS.2023.3309809.
Cheng, J., Zhang, L., Chen, Q., Hu, X., Cai, J., 2022. A review of visual SLAM methods for autonomous driving vehicles. Eng. Appl. Artif. Intell. 114, 104992. http://dx.doi.org/10.1016/j.engappai.2022.104992.
Choutri, K., Lagha, M., Meshoul, S., Shaiba, H., Chegrani, A., Yahiaoui, M., 2024. Vision-based UAV detection and localization to indoor positioning system. Sensors 24 (13), http://dx.doi.org/10.3390/s24134121.
Chum, O., Matas, J., Kittler, J., 2003. Locally optimized RANSAC. In: Pattern Recognition: 25th DAGM Symposium, Magdeburg, Germany, September 10-12, 2003. Proceedings 25. Springer, pp. 236–243. http://dx.doi.org/10.1007/978-3-540-45243-0_31.
Ci, W., Huang, Y., Hu, X., 2019. Stereo visual odometry based on motion decoupling and special feature screening for navigation of autonomous vehicles. IEEE Sens. J. 19 (18), 8047–8056. http://dx.doi.org/10.1109/JSEN.2019.2917936.
Kitt, B., Geiger, A., Lategahn, H., 2010. Visual odometry based on stereo image sequences with RANSAC-based outlier rejection scheme. In: 2010 IEEE Intelligent Vehicles Symposium. pp. 486–492. http://dx.doi.org/10.1109/IVS.2010.5548123.
Feng, G., Jiang, Z., Tan, X., Cheng, F., 2022. Hierarchical clustering-based image retrieval for indoor visual localization. Electronics 11 (21), http://dx.doi.org/10.3390/electronics11213609.
Fischler, M.A., Bolles, R.C., 1981. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24 (6), 381–395. http://dx.doi.org/10.1145/358669.358692.
Hartley, R., 1997. In defense of the eight-point algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 19 (6), 580–593. http://dx.doi.org/10.1109/34.601246.
Hartley, R.I., Sturm, P., 1997. Triangulation. Comput. Vis. Image Underst. 68 (2), 146–157. http://dx.doi.org/10.1006/cviu.1997.0547.
Hess, W., Kohler, D., Rapp, H., Andor, D., 2016. Real-time loop closure in 2D LIDAR SLAM. In: 2016 IEEE International Conference on Robotics and Automation. ICRA, pp. 1271–1278. http://dx.doi.org/10.1109/ICRA.2016.7487258.
Hunter, G.M., Steiglitz, K., 1979. Operations on images using quad trees. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1 (2), 145–153. http://dx.doi.org/10.1109/TPAMI.1979.4766900.
Jia, S., Ma, L., Yang, S., Qin, D., 2023. A novel visual indoor positioning method with efficient image deblurring. IEEE Trans. Mob. Comput. 22 (7), 3757–3773. http://dx.doi.org/10.1109/TMC.2022.3143502.
Liang, J.Z., Corso, N., Turner, E., Zakhor, A., 2013. Image based localization in indoor environments. In: 2013 Fourth International Conference on Computing for Geospatial Research and Application. pp. 70–75. http://dx.doi.org/10.1109/COMGEO.2013.11.
Lowe, D.G., 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110. http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94.
Nguyen, T.H., Nguyen, T.-M., Xie, L., 2021. Range-focused fusion of camera-IMU-UWB for accurate and drift-reduced localization. IEEE Robot. Autom. Lett. 6 (2), 1678–1685. http://dx.doi.org/10.1109/LRA.2021.3057839.
Niedfeldt, P.C., Ingersoll, K., Beard, R.W., 2017. Comparison and analysis of recursive-RANSAC for multiple target tracking. IEEE Trans. Aerosp. Electron. Syst. 53 (1), 461–476. http://dx.doi.org/10.1109/TAES.2017.2650817.
Rublee, E., Rabaud, V., Konolige, K., Bradski, G., 2011. ORB: An efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision. pp. 2564–2571. http://dx.doi.org/10.1109/ICCV.2011.6126544.
Raguram, R., Chum, O., Pollefeys, M., Matas, J., Frahm, J.-M., 2013. USAC: A universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35 (8), 2022–2038. http://dx.doi.org/10.1109/TPAMI.2012.257.
Sadeghi, H., Valaee, S., Shirani, S., 2014. A weighted KNN epipolar geometry-based approach for vision-based indoor localization using smartphone cameras. In: 2014 IEEE 8th Sensor Array and Multichannel Signal Processing Workshop. SAM, pp. 37–40. http://dx.doi.org/10.1109/SAM.2014.6882332.
Sadeghi, H., Valaee, S., Shirani, S., 2017. 2DTriPnP: A robust two-dimensional method for fine visual localization using Google streetview database. IEEE Trans. Veh. Technol. 66 (6), 4678–4690. http://dx.doi.org/10.1109/TVT.2016.2615630.
Sang, C.L., Adams, M., Hesse, M., Rückert, U., 2023. Bidirectional UWB localization: A review on an elastic positioning scheme for GNSS-deprived zones. IEEE J. Indoor Seamless Position. Navig. 1, 161–179. http://dx.doi.org/10.1109/JISPIN.2023.3337055.
Schauwecker, K., Klette, R., Zell, A., 2012. A new feature detector and stereo matching method for accurate high-performance sparse stereo matching. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 5171–5176.
Tang, C., Sun, W., Zhang, X., Zheng, J., Sun, J., Liu, C., 2024. A sequential-multi-decision scheme for WiFi localization using vision-based refinement. IEEE Trans. Mob. Comput. 23 (3), 2321–2336. http://dx.doi.org/10.1109/TMC.2023.3253893.
Tiku, S., Pasricha, S., 2023. An overview of indoor localization techniques. In: Machine Learning for Indoor Localization and Navigation. pp. 3–25. http://dx.doi.org/10.1007/978-3-031-26712-3_1.
Vedadi, F., Valaee, S., 2020. Automatic visual fingerprinting for indoor image-based localization applications. IEEE Trans. Syst. Man Cybern.: Syst. 50 (1), 305–317. http://dx.doi.org/10.1109/TSMC.2017.2695080.
Wang, R., Shen, Y., Zuo, W., Zhou, S., Zheng, N., 2022. TransVPR: Transformer-based place recognition with multi-level attention aggregation. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR, pp. 13638–13647. http://dx.doi.org/10.1109/CVPR52688.2022.01328.
You, Y., Wu, C., 2021. Hybrid indoor positioning system for pedestrians with swinging arms based on smartphone IMU and RSSI of BLE. IEEE Trans. Instrum. Meas. 70, 1–15. http://dx.doi.org/10.1109/TIM.2021.3084289.
Zhang, H., Chen, X., Jing, H., Zheng, Y., Wu, Y., Jin, C., 2023. ETR: An efficient transformer for re-ranking in visual place recognition. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision. WACV, pp. 5654–5663. http://dx.doi.org/10.1109/WACV56688.2023.00562.
Zhu, Y., Zhang, D., Zhou, Y., Jin, W., Zhou, L., Wu, G., Li, Y., 2024. A binocular stereo-imaging-perception system with a wide field-of-view and infrared- and visible light-dual-band fusion. Sensors 24 (2), http://dx.doi.org/10.3390/s24020676.
Zhuang, Y., Zhang, C., Huai, J., Li, Y., Chen, L., Chen, R., 2022. Bluetooth localization technology: Principles, applications, and future trends. IEEE Internet Things J. 9 (23), 23506–23524. http://dx.doi.org/10.1109/JIOT.2022.3203414.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Dr. Yuki Nakamoto, Dr. Felix Bauer

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain full copyright of their published work. By submitting to an ERDAST journal and upon acceptance of the article, the author(s) agree to grant the journal a non-exclusive license to publish, reproduce, distribute, and archive the article. All articles are published under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0).
Under this license: Others may copy, distribute, display, remix, adapt, and build upon the work, even commercially, As long as proper credit is given to the original author(s) and source, A link to the license is provided, an, Any changes are clearly indicated.
???? License link: https://creativecommons.org/licenses/by/4.0/