Enhanced Malware Detection through Function Parameter Encoding and API Dependency Modeling

Sneha R. Patil; Dr. Liam O. Hughes

Authors

Sneha R. Patil Department of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India
Dr. Liam O. Hughes School of Computing and Digital Security, University of Birmingham, Birmingham, United Kingdom

Keywords:

Malware detection, Dynamic analysis, API calls, Function parameters

Abstract

Malware continues to pose a significant threat to cybersecurity, evolving rapidly in complexity and evasion techniques. Traditional detection methods often struggle against sophisticated attacks due to their reliance on static signatures or limited understanding of program behavior. This article introduces a novel dynamic malware detection approach that leverages both function parameter encoding and function dependency modeling derived from Application Programming Interface (API) call sequences. By capturing the rich contextual information conveyed through API call parameters and understanding the intricate relationships between function invocations, our method aims to provide a more robust and accurate classification of malicious software. We detail the methodology, from dynamic analysis and data collection to the feature engineering and model training, and present results demonstrating superior performance compared to existing techniques that primarily rely on API call sequences alone. The findings underscore the importance of deeper behavioral analysis for effective malware detection in the contemporary threat landscape.

References

Alomari ES, Nuiaa RR, Alyasseri ZAA, Mohammed HJ, Sani NS, Esa MI, Musawi BA. 2023. Malware detection using deep learning and correlation-based feature selection. Symmetry 15(1):123.

Amer E, Mohamed A, Mohamed SE, Ashaf M, Ehab A, Shereef O, Metwaie H. 2022. Using machine learning to identify android malware relying on api calling sequences and permissions. Journal of Computing and Communication 1(1):38-47.

Amer E, Zelinka I. 2020. A dynamic windows malware detection and prediction method based on contextual understanding of API call sequence. Computers & Security 92(7):101760.

AV TEST. 2023. Malware statistics[eb/ol].

Bai S, Kolter JZ, Koltun V. 2018. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. ArXiv preprint.

Chaganti R, Ravi V, Pham TD. 2022. Deep learning based cross architecture internet of things malware detection and classification. Computers & Security 120:102779.

Chen X, Hao Z, Li L, Cui L, Zhu Y, Ding Z, Liu Y. 2022. Cruparamer: learning on parameter-augmented API sequences for malware detection. IEEE Transactions on Information Forensics and Security 17(1):788-803.

Cozzi E, Graziano M, Fratantonio Y, Balzarotti D. 2018. Understanding Linux malware.

Di Gennaro G, Buonanno A, Palmieri FA. 2021. Considerations about learning word2vec. The Journal of Supercomputing 77(11):1-16.

ENISA. 2023. Enisa threat landscape 2023.

Feng P, Gai L, Yang L, Wang Q, Li T, Xi N, Ma J. 2024. DawnGNN: documentation augmented windows malware detection using graph neural network. Computers & Security 140:103788.

Hemalatha J, Roseline SA, Geetha S, Kadry S, Damaševičius R. 2021. An efficient densenet-based deep learning model for malware detection. Entropy 23(3):344.

Jamalpur S, Navya YS, Raja P, Tagore G, Rao GRK. 2018. Dynamic malware analysis using cuckoo sandbox.

kericwy1337. 2019. Malicious-code-dataset. GitHub.

Kishore P, Gond BP, Mohapatra DP. 2024. Enhancing malware classification with machine learning: a comparative analysis of API sequence-based techniques.

Li C, Cheng Z, Zhu H, Wang L, Lv Q, Wang Y, Li N, Sun D. 2022. DMalNet: dynamic malware analysis based on API feature engineering and graph learning. Computers & Security 122:102872.

Liu Y, Wang Y. 2019. A robust malware detection system using deep learning on API calls.

Maniriho P, Mahmood AN, Chowdhury MJM. 2023. API-maldetect: automated malware detection framework for windows based on api calls and deep learning techniques. Journal of Network and Computer Applications 218(6):103704.

Microsoft. 2024. Microsoft Windows app development documentation.

Ndibanje B, Kim KH, Kang YJ, Kim HH, Kim TY, Lee HJ. 2019. Cross-method-based analysis and classification of malicious behavior by API calls extraction. Applied Sciences 9(2):239.

Ni S. 2019. The harm of computer malware and prevention methods. Public Communication of Science & Technology 11:125-126.

Ofoeda J, Boateng R, Effah J. 2019. Application programming interface (API) research: a review of the past to inform the future. International Journal of Enterprise Information Systems (IJEIS) 15(3):76-95.

Oti EU, Olusola MO, Eze FC, Enogwe SU. 2021. Comprehensive review of k-means clustering algorithms. Criterion 12(08):22-23.

Qaiser S, Ali R. 2018. Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications 181(1):25-29.

Qbeitah MA, Aldwairi M. 2018. Dynamic malware analysis of phishing emails.

Saleh M, Li T, Xu S. 2018. Multi-context features for detecting malicious programs. Journal of Computer Virology and Hacking Techniques 14(2):181-193.

Saxe J, Berlin K. 2017. eXpose: a character-level convolutional neural network with embeddings for detecting malicious URLs, file paths and registry keys. ArXiv preprint.

Sharma P. 2022. Windows malware detection using machine learning and TF-IDF enriched API calls information.

Shaukat K, Luo S, Varadharajan V. 2023. A novel deep learning-based approach for malware detection. Engineering Applications of Artificial Intelligence 122(4):106030.

Sherstinsky A. 2020. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena 404(8):132306.

Singh J, Singh J. 2022. Assessment of supervised machine learning algorithms using dynamic API calls for malware detection. International Journal of Computers and Applications 44(3):270-277.

Unnithan NA, Harikrishnan N, Akarsh S, Vinayakumar R, Soman K. 2018. Machine learning based phishing e-mail detection. Security-CEN@ Amrita.

Yang H, Li S, Wu X, Lu H, Han W. 2019. A novel solutions for malicious code detection and family clustering based on machine learning. IEEE Access 7:148853–148860.

Yang P, Zhao G, Zeng P. 2019. Phishing website detection based on multidimensional features driven by deep learning. IEEE Access 7:15196-15209.

Yin J, Wang J. 2014. A dirichlet multinomial mixture model-based approach for short text clustering.

Zargar S. 2021. Introduction to sequence learning models: RNN, LSTM, GRU. Raleigh: Department of Mechanical and Aerospace Engineering, North Carolina State University.

Zhang Y, Li B. 2020. Malicious code detection based on code semantic features. IEEE Access 8:176728–176737.

Zhang Z, Qi P, Wang W. 2020. Dynamic malware analysis with feature engineering and feature learning. Proceedings of the AAAI Conference on Artificial Intelligence 34:1210-1217.

Zhang T, Ramakrishnan R, Livny M. 1996. Birch: an efficient data clustering method for very large databases. ACM Sigmod Record 25(2):103-114.

Zhao D, Wang H, Kou L, Li Z, Zhang J. 2023. Dynamic malware detection using parameter-augmented semantic chain. Electronics 12(24):4992.

Zheng X. 2018. Introduction to the harm and prevention of computer malware. Taiyuan City, Shanxi Province, China: Public Communication of Science & Technology. 125-126.

Zhou B, Huang H, Xia J, Tian D. 2024. A novel malware detection method based on API embedding and API parameters. The Journal of Supercomputing 80(2):2748-2766.

Enhanced Malware Detection through Function Parameter Encoding and API Dependency Modeling

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Make a Submission

Information

Journal Links

Make a Submission