Botnets can dynamically generate numerous unpredictable domains via Domain Generation Algorithms (DGA) to elude traditional static detection, enhancing the stealth and persistence of malicious activities. As DGA technology advances, traditional detection methods are facing growing challenges. Efficiently identifying and defending against these domains has become crucial in cybersecurity. This paper comprehensively analyzes mainstream DGA detection technologies, including those based on statistical features, machine learning, and deep learning. It delves into their principles, application scenarios, and performance, uncovering limitations in false positive rates, computational complexity, dataset size, and adaptability to new DGAs. Finally, the paper proposes innovative directions for deep learning-based detection and cross domain collaborative detection. Combined with traffic behavior analysis and generation-pattern blocking mechanisms, we build a multi-Layered, integrated DGA defense system, offering new ideas to improve detection effectiveness, accuracy, and adaptability.
DGA是一种用于自动生成大量域名的技术,这些域名通常具有高度的随机性和动态性,并且与真实合法域名难以区分。DGA生成的域名与传统静态域名不同,它们在生命周期内频繁变化,增加了检测难度。DGA的生成方法多样,包括基于字典的拼接、字符随机化以及基于熵值的生成等。这些策略使得DGA能够迅速产生大量域名,从而广泛应用于僵尸网络的指挥与控制(C&C)通信[6-7]。DGA在僵尸网络中充当着关键角色,它不仅使攻击者能够绕过基于静态黑名单和IP地址的传统检测机制,还能够在一定程度上保证C&C通信的稳定性与隐蔽性。攻击者通过DGA动态控制被感染设备,并指挥其执行恶意活动。由于DGA生成的域名会在短时间内频繁变化,传统的检测方法,如基于静态域名匹配和域名系统(Domain Name System,DNS)流量分析的检测,往往无法有效应对。随着生成算法的不断升级,攻击者能够根据网络环境和检测机制的变化实时调整生成规则,从而逃避检测。这一特点使得DGA成为当前网络安全防御中的重大挑战[8]。随着DGA技术的广泛应用,相关的检测技术手段也不断发展。目前,DGA检测方法大致可以分为基于统计特征、基于机器学习以及基于深度学习的检测方法[9-11]。然而,由于DGA的快速演化和高度不可预测性,现有检测方法面临诸多挑战,如过高的计算资源消耗、较高的误报率和实时性不足等问题。因此,如何有效地检测和防御DGA攻击,仍然是网络安全领域需要攻克的研究难题。
基于统计分析的方法所利用的信息可分为两个部分。(1)只使用域名自身信息,如域名长度、字符频率、n-gram分布、KL距离、Jaccard系数;(2)配合使用域名附属信息,如DNS信息[恶意域名的生存时间(Time To Live,TTL)值往往较低、DGA域名通常有高比例的不存在的域名(Non-Existent Domain,NXDOMAIN)记录、IP地址分布等]、域名注册信息查询协议(WHOIS)信息(例如短期注册域名或频繁更换注册人信息的域名可能具有恶意特征、同一注册人名下的多个域名可能属于同一攻击者)、域名历史信息等。
Zhou等[47]基于NXDomain流量,通过域名的二级域(2LD)或解析IP地址进行分组,并通过计算访问模式和活动时间分布的相似性来聚类。Schiavoni等[48]研发了Phoenix系统,该系统结合了域名字符串特征和IP地址特征,并依据基于密度的空间聚类算法(Density-based Spatial Clustering of Applications with Noise,DBSCAN)聚类算法的结果来做出判断。Antonakakis等[49]提出了Pleiades,该算法分为两个模块:DGA发现模块和DGA分类和C&C检测模块(如图2所示)。DGA发现模块通过分析DNS查询失败的响应,使用字符串特征和主机查询重叠度进行聚类,准确发现新的DGA并构建新模型,而DGA分类和C&C检测模块则通过多类分类器和隐马尔可夫模型对新发现的域名进行分类和识别,提高了C&C域名的识别精度并减少误报。基于聚类的DGA检测属于“回溯检测模式”,即只能离线检测,这意味着检测存在时间延迟。而相比其他监督方法的“实时检测模式”,聚类算法的优点是无监督且可处理大批量数据。
XUX L, ZHOUY L, LIQ S. Domain Algorithmically Generated Botnet Detection and Analysis[C]//International Conference on Security and Privacy in Communication Networks. Cham: Springer International Publishing, 2015: 530-534. DOI:10.1007/978-3-319-23829-6_38 .
WANGY, WANGZ C, PANR. Survey of DGA Domain Name Detection Based on Character Feature[J]. Comput Sci, 2023, 50(8): 251-259. DOI: 10.11896/jsjkx.220700277 .
WANGX X, HUANGJ H, ZHAIY, et al. Survey of Detection Techniques for Domain Generation Algorithm[J]. Comput Sci, 2024, 51(8): 371-378. DOI: 10.11896/jsjkx.230700189 .
[8]
SAEEDA M H, WANGD H, ALNEDHARIH A M, et al. A Survey of Machine Learning and Deep Learning Based DGA Detection Techniques[M]//Smart Computing and Communication. Cham: Springer International Publishing, 2022: 133-143. DOI:10.1007/978-3-030-97774-0_12 .
NIEL H, SHANX Y, ZHAOL P, et al. PKDGA: A Partial Knowledge-based Domain Generation Algorithm for Botnets[J]. IEEE Trans Inf Forensics Secur, 2023, 18: 4854-4869. DOI:10.1109/TIFS.2023.3298229 .
[11]
ALQAHTANIH, KUMARG. Advances in Artificial Intelligence for Detecting Algorithmically Generated Domains: Current Trends and Future Prospects[J]. Eng Appl Artif Intell, 2024, 138: 109410. DOI:10.1016/j.engappai.2024.109410 .
[12]
ALAEIYANM, PARSAS, VINODP, et al. Detection of Algorithmically-generated Domains: An Adversarial Machine Learning Approach[J]. Comput Commun, 2020, 160: 661-673. DOI:10.1016/j.comcom.2020.04.033 .
[13]
XIONGW, JIANGH Y, GUANH T, et al. DSQNet: Domain SeQuence Based Deep Neural Network for AGDs Detection[C]//2021 IEEE Symposium on Computers and Communications (ISCC). New York: IEEE, 2021: 1-7. DOI:10.1109/ISCC53001.2021.9631503 .
[14]
DINGL, LIL J, HANJ H, et al. Detecting Domain Generation Algorithms with Bi-LSTM[J]. Comput Mater Continua, 2019, 61(3): 1285-1304. DOI:10.32604/cmc.2019.06160 .
[15]
AMINIP, ARAGHIZADEHM A, AZMIR. A Survey on Botnet: Classification, Detection and Defense[C]//2015 International Electronics Symposium (IES). New York: IEEE, 2015: 233-238. DOI:10.1109/ELECSYM.2015.7380847 .
[16]
MAHMOUDM, NIR M, MATRAWYA. A Survey on Botnet Architectures, Detection and Defences[J]. Int J Netw Secur, 2014, 17(3): 272-289.
HAMMOODI HASAN KABLAA, ANBARM, MANICKAMS, et al. Monitoring Peer-to-peer Botnets: Requirements, Challenges, and Future Works[J]. Comput Mater Continua, 2023, 75(2): 3375-3398. DOI:10.32604/cmc.2023.036587 .
[19]
GAOH Y, LIL X, LEIH, et al. One IOTA of Countless Legions: A Next-generation Botnet Premises Design Substrated on Blockchain and Internet of Things[J]. IEEE Internet Things J, 2024, 11(5): 9107-9126. DOI:10.1109/JIOT.2023.3322716 .
[20]
BARABOSCHT, WICHMANNA, LEDERF, et al. Automatic Extraction of Domain Name Generation Algorithms From Current Malware[C]//Proc. NATO Symposium IST-111 on Information Assurance and Cyber Defense. Koblenz, Germany: NATO STO, 2012.
[21]
SCHWARZD, BEDEP'SD G A. Trading Foreign Exchange for Malware Domains, 2015[EB/OL].
[22]
STONE-GROSSB, COVAM, CAVALLAROL, et al. Your Botnet Is My Botnet: Analysis of a Botnet Takeover[C]//Proceedings of the 16th ACM Conference on Computer and Communications Security. New York: ACM, 2009: 1635-647. DOI:10.1145/1653662.1653738 .
[23]
SECURITYRESPONSE. Butterfly: Corporate Spies out for Financial Gain. Tech. rep[R]. Tempe, A2, USA: Symantec, 2015.
[24]
PLOHMANND, YAKDANK, KLATTM, et al. A Comprehensive Measurement Study of Domain Generating Malware[C]//25th USENIX Security Symposium (USENIX Security 16). Austin, TX, USA: USENIX Association, 2016: 263-278. DOI: 10.5555/3241094 .
[25]
LEDERF, WERNERT. Know Your Enemy: Containing Conficker[R]. Germany: University of Bonn, 2009.
Chasing Cybercrime: Network Insights of Dyre and Dridex Trojan bankers[R]. Barcelona, Spain: Blueliv, 2015.
[28]
HIGHNAMK, PUZIOD, LUOS, et al. Real-time Detection of Dictionary DGA Network Traffic Using Deep Learning[J]. SN Computer Science, 2021, 2(2): 110. DOI: 10.1007/s42979-021-00507-w .
[29]
ALEXA. Top Sites on the Web[EB/OL]. (2015)[2025-03-29].
[30]
BAUMGARTNERK, RAIUC. Sinkholing Volatile Cedar DGA Infrastructure[EB/OL]. (2015)[2025-04-10]. https:urelist.com/blog/research/69421/sinkholingvolatile-cedar-dga-infrastructure/.
[31]
VISHVAKARMAD K, BHATIAA, RIHAZ. Detection of Algorithmically Generated Domain Names in Botnets[M]//Advanced Information Networking and Applications. Cham: Springer International Publishing, 2019: 1279-1290. DOI:10.1007/978-3-030-15032-7_107 .
[32]
KUMARV, KUMARS, GUPTAA K. Real-time Detection of Botnet Behavior in Cloud Using Domain Generation Algorithm[C]//Proceedings of the International Conference on Advances in Information Communication Technology & Computing - AICTC '16. New York: ACM, 2016: 1-3. DOI:10.1145/2979779.2979848 .
[33]
SAROJINIS, ASHAS. Detection for Domain Generation Algorithm (DGA) Domain Botnet Based on Neural Network with Multi-head Self-attention Mechanisms[J]. Int J Syst Assur Eng Manag, 2022: 1-16. DOI:10.1007/s13198-022-01713-2 .
[34]
ZANGX D, CAOJ B, ZHANGX C, et al. BotDetector: a System for Identifying DGA-based Botnet with CNN-LSTM[J]. Telecommun Syst, 2024, 85(2): 207-223. DOI:10.1007/s11235-023-01073-7 .
[35]
YADAVS, REDDYA K K, NARASIMHA REDDYA L, et al. Detecting Algorithmically Generated Malicious Domain Names[C]//Proceedings of the 10th ACM SIGCOMM Conference on Internet Measurement. New York: ACM, 2010: 10.1145/1879141.1879148. DOI:10.1145/1879141.1879148 .
[36]
YADAVS, REDDYA K K, NARASIMHA REDDYA L, et al. Detecting Algorithmically Generated Domain-flux Attacks with DNS Traffic Analysis[J]. IEEE/ACM Trans Netw, 2012, 20(5): 1663-1677. DOI:10.1109/TNET.2012.2184552 .
[37]
ANTONAKAKISM, PERDISCIR, DAGOND, et al. Building a Dynamic Reputation System for DNS[J]. Proc 19th USENIX Secur Symp, 2010: 273-289.
[38]
ZHANGY, ZHANGY Z, XIAOJ. Detecting the DGA-based Malicious Domain Names[M]//Trustworthy Computing and Services. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014: 130-137. DOI:10.1007/978-3-662-43908-1_17 .
[39]
WANGW, SHIRLEYK. Breaking Bad: Detecting Malicious Domains Using Word Segmentation[EB/OL]. (2015-06-12)[2025-04-10].
[40]
GRILLM, NIKOLAEVI, VALEROSV, et al. Detecting DGA Malware Using Net Flow[C]//2015 IFIP/IEEE International Symposium on Integrated Network Management (IM). New York: IEEE, 2015: 1304-1309. DOI:10.1109/INM.2015.7140486 .
[41]
ZANGX, GONGJ, HUX. Detecting Malicious Domain Name Based on AGD[J]. Journal on Communications, 2018, 39(7): 15-25. DOI:10.11959/j.issn.1000-436x.2018116 .
SCHÖLKOPFB, SMOLAA J. Learning with kernels: support vector machines, regularization, optimization, and beyond[M]. Cambridge, Mass.: MIT Press, 2002.
[44]
DAHALB, KIMY. AutoEncoded Domains with Mean Activation for DGA Botnet Detection[C]//2019 IEEE 12th International Conference on Global Security, Safety and Sustainability (ICGS3). New York: IEEE, 2019: 208-212. DOI:10.1109/icgs3.2019.8688037 .
[45]
DAVUTHN, KIMS R. Classification of Malicious Domain Names Using Support Vector Machine and Bi-gram Method[J]. International Journal of Security and Its Applications, 2013,7(1): 51-58.
[46]
BILGEL, KIRDAE, KRUEGELC, et al. EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis[C]//Ndss. San Diego, USA: The Internet Society. 2011: 1-17.
SIVAGURUR, CHOUDHARYC, YUB, et al. An Evaluation of DGA Classifiers[C]//2018 IEEE International Conference on Big Data (Big Data). New York: IEEE, 2018: 5058-5067. DOI:10.1109/BigData.2018.8621875 .
[49]
NIEL H, ZHAOL P, LIK Q, et al. A Game-based Adversarial DGA Detection Scheme Using Multi-level Incremental Random Forest[J]. IEEE Trans Netw Sci Eng, 2023, 11(1): 779-792. DOI:10.1109/TNSE.2023.3308126 .
[50]
ZHOUY, LIQ, MIAOQ, et al. DGA-Based Botnet Detection Using DNS Traffic[J]. Journal of Internet Services and Information Security, 2013, 3(3/4): 116-123. DOI:10.22667/JISIS.2013.11.31.116 .
[51]
SCHIAVONIS, MAGGIF, CAVALLAROL, et al. Phoenix: DGA-based Botnet Tracking and Intelligence[M]//Detection of Intrusions and Malware, and Vulnerability Assessment. Cham: Springer International Publishing, 2014: 192-211. DOI:10.1007/978-3-319-08509-8_11 .
[52]
ANTONAKAKISM, PERDISCIR, NADJIY, et al. From Throw-away Traffic to Bots: Detecting the Rise of DGA-based Malware[C]//21st USENIX Security Symposium (USENIX Security 12). Bellevue, WA, USA: USENIX Association. 2012: 491-506.
YUB, GRAYD L, PANJ, et al. Inline DGA Detection with Deep Networks[C]//2017 IEEE International Conference on Data Mining Workshops (ICDMW). New York: IEEE, 2017: 683-692. DOI:10.1109/ICDMW.2017.96 .
[55]
YUB, PANJ, HUJ M, et al. Character Level Based Detection of DGA Domain Names[C]//2018 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2018: 1-8. DOI:10.1109/IJCNN.2018.8489147 .
[56]
CHENG B, YED H, XINGZ C, et al. Ensemble Application of Convolutional and Recurrent Neural Networks for Multi-label Text Categorization[C]//2017 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2017: 2377-2383. DOI:10.1109/IJCNN.2017.7966144 .
[57]
KIMY, JERNITEY, SONTAGD, et al. Character-aware Neural Language Models[J]. Proc AAAI Conf Artif Intell, 2016, 30(1): 2714-2749. DOI:10.1609/aaai.v30i1.10362
[58]
MOHANV S, R V, KP S, et al. S.P.O.O.F Net: Syntactic Patterns for Identification of Ominous Online Factors[C]//2018 IEEE Security and Privacy Workshops (SPW). New York: IEEE, 2018: 258-263. DOI:10.1109/SPW.2018.00041 .
[59]
ARAVENAL T, CASASP, BUSTOS-JIMÉNEZJ, et al. DeepD2V - Deep Learning and Domain Word Embeddings for DGA Based Malware Detection[C]//2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN). New York: IEEE, 2024: 164-170. DOI:10.1109/ICMLCN59089.2024.10624693 .
[60]
MIKOLOVT, CHENK, CORRADOG, et al. Efficient Estimation of Word Representations in Vector Space[EB/OL]. (2013-09-07)[2025-04-10].
[61]
SUNG Y, CHENGY N, ZHANGZ X, et al. Text Classification with Improved Word Embedding and Adaptive Segmentation[J]. Expert Syst Appl, 2024, 238: 121852. DOI:10.1016/j.eswa.2023.121852 .
[62]
KHINEA H, WETTAYAPRASITW, DUANGSUWANJ. A New Word Embedding Model Integrated with Medical Knowledge for Deep Learning-based Sentiment Classification[J]. Artif Intell Med, 2024, 148: 102758. DOI:10.1016/j.artmed.2023.102758 .
[63]
KOHJ J, RHODESB. Inline Detection of Domain Generation Algorithms with Context-sensitive Word Embeddings[C]//2018 IEEE International Conference on Big Data (Big Data). New York: IEEE, 2018: 2966-2971. DOI:10.1109/BigData.2018.8622066 .
WOODBRIDGEJ, ANDERSONH S, AHUJAA, et al. Predicting Domain Generation Algorithms with Long Short-term Memory Networks[EB/OL]. (2016-11-02)[2025-01-10].
[66]
TRAND, MAC H, TONGV, et al. A LSTM Based Framework for Handling Multiclass Imbalance in DGA Botnet Detection[J]. Neurocomputing, 2018, 275: 2401-2413. DOI:10.1016/j.neucom.2017.11.018 .
[67]
SHAHZADH, SATTARA R, SKANDARANIYAMJ. DGA Domain Detection Using Deep Learning[C]//2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP). New York: IEEE, 2021: 139-143. DOI:10.1109/CSP51677.2021.9357591 .
[68]
NAMGUNGJ, SON S, MOONY S. Efficient Deep Learning Models for DGA Domain Detection[J]. Secur Commun Netw, 2021, 2021(1): 8887881. DOI:10.1155/2021/8887881 .
[69]
CHENY, ZHANGS, LIUJ, et al. Towards a Deep Learning Approach for Detecting Malicious Domains[C]//2018 IEEE International Conference on Smart Cloud (SmartCloud). New York: IEEE, 2018: 190-195. DOI:10.1109/SmartCloud.2018.00039 .
LANGB, XIEC, CHENS J, et al. Fast-flux Malicious Domain Name Detection Method Based on Multimodal Feature Fusion[J]. Netinfo Secur, 2022(4): 20-29. DOI: 10.3969/j.issn.1671-1122.2022.04.003 .
[72]
GAON, GAOL, GAOQ L, et al. An Intrusion Detection Model Based on Deep Belief Networks[C]//2014 Second International Conference on Advanced Cloud and Big Data. New York: IEEE, 2014: 247-252. DOI:10.1109/CBD.2014.41 .
[73]
ANDERSONH S, WOODBRIDGEJ, FILARB. DeepDGA: Adversarially-tuned Domain Generation and Detection[C]//Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. New York: ACM, 2016: 13-21. DOI:10.1145/2996758.2996767 .
[74]
ZHAIY, YANGJ, WANGZ X, et al. Cdga: a GAN-based Controllable Domain Generation Algorithm[C]//2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). New York: IEEE, 2022: 352-360. DOI:10.1109/TrustCom56396.2022.00056 .
[75]
TUANT A, LONGH V, TANIARD. On Detecting and Classifying DGA Botnets and Their Families[J]. Comput Secur, 2022, 113: 102549. DOI:10.1016/j.cose.2021.102549 .
[76]
HUX Y, LIM, CHENGG, et al. Towards Accurate DGA Detection Based on Siamese Network with Insufficient Training Samples[C]//ICC 2022-IEEE International Conference on Communications. New York: IEEE, 2022: 2670-2675. DOI:10.1109/ICC45855.2022.9838409 .
[77]
GAOF, HANW B, OU W. BotHunter: Distributed Malicious Domain Name Detection Model Based on Deep Learning and Blockchain[M]//Intelligent Computing Technology and Automation. 2024: 903-913. Amsterdam: IOS Press, DOI:10.3233/atde231270 .
[78]
GRAVESA, SCHMIDHUBERJ. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Netw, 2005, 18(5/6): 602-610. DOI:10.1016/j.neunet.2005.06.042 .
[79]
HUX Y, CHENH, LIM, et al. ReplaceDGA: BiLSTM-based Adversarial DGA with High Anti-detection Ability[J]. IEEE Trans Inf Forensics Secur, 2023, 18: 4406-4421. DOI:10.1109/TIFS.2023.3293956 .
[80]
LEEH J, KIMH K. Mitigating False Positives in DGA Detection for Non-English Domain Names[C]//2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks-Supplemental Volume (DSN-S). New York: IEEE, 2024: 150-151. DOI:10.1109/DSN-S60304.2024.00042 .
YUZ C, LINGJ. A DGA Domain Name Detection Method Based on Transformer and Multi-feature Fusion[J]. Comput Eng Sci, 2023, 45(8): 1416-1423. DOI: 10.3969/j.issn.1007-130X.2023.08.010 .
[83]
REYNIER LAO, CATANIAC A, PARLANTIT. LLMS for Domain Generation Algorithm Detection[EB/OL]. (2024-11-6)[2025-04-10].