A curated list of tools and resources related to the use of machine learning for cyber security.
The problem regarding the use of machine learning in cyber security is difficult to solve because the advances in the field offer many opportunities that it is challenging to find exceptional and beneficial use cases for implementation and decision making. Moreover, such technologies can be used by intruders to attack computer systems. The goal of this list is to give you the tools and resources related to the use of machine learning for cyber security.
Machine Learning Cyber security resources
Datasets
- Samples of Security Related Data
- Samples of various types of Security related covering
- Network
- Malware
- System
- Password
- Threat Feeds
- Samples of various types of Security related covering
- Stratosphere IPS Data Sets
- Stratosphere Research Laboratory
- Open Data Sets
- Comprehensive, Multi-Source Cyber-Security Events
- Unified Host and Network Data Set
- User-Computer Authentication Associations in Time
- Data Capture from the National Security Agency
- Datasets permitted by The National Security Agency
- Snort Intrusion Detection Log
- Domain Name Service Logs
- Web Server Logs
- Log Server Aggregate Log
- Datasets permitted by The National Security Agency
- The ADFA Intrusion Detection Data Sets
- The datasets cover both Linux and Windows; they are designed for evaluation by system call based HIDS
- NSL-KDD Data Sets
- Malicious URLs Data Sets
- Detecting Malicious URLs
- Multi-Source Cyber-Security Events
- This data set represents 58 consecutive days of de-identified event data collected from five sources within Los Alamos National Laboratory’s corporate, internal computer network.
Papers
- Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks
- Awarded Best Paper
- Outside the Closed World: On Using Machine Learning for Network Intrusion Detection
- using machine learning tools to monitor network’s activity
- Anomalous Payload-Based Network Intrusion Detection
- fully automatic payload-based anomaly detector
- Malicious PDF detection using metadata and structural features
- Adversarial support vector machine learning
- Exploiting machine learning to subvert your spam filter
- CAMP – Content Agnostic Malware Protection
- Notos – Building a Dynamic Reputation System for DNS
- Kopis – Detecting malware domains at the upper dns hierarchy
- Pleiades – From Throw-away Traffic To Bots – Detecting The Rise Of DGA-based Malware
- EXPOSURE – Finding Malicious Domains Using Passive DNS Analysis
- Polonium – Tera-Scale Graph Mining for Malware Detection
- Nazca – Detecting Malware Distribution in Large-Scale Networks
- PAYL – Anomalous Payload-based Network Intrusion Detection
- Anagram – A Content Anomaly Detector Resistant to Mimicry Attacks
- Applications of Machine Learning in Cyber Security
- This study covers phishing detection, network intrusion detection, testing security properties of protocols, authentication with keystroke dynamics, cryptography, human interaction proofs, spam detection in social network, smart meter energy consumption profiling, and issues in security of machine learning techniques itself.
- An Investigation of Byte N-Gram Features for Malware Classification
Books
- Data Mining and Machine Learning in Cybersecurity
- this is a pretty decent, well-organized book, and seems it’s written from vast Experience and Research.
- Machine Learning and Data Mining for Computer Security
- This book provides an overview of the current state of research in machine learning and data mining as it applies to problems in computer security.
- Network Anomaly Detection: A Machine Learning Perspective
- this book presents machine learning techniques in depth to help you more effectively detect and counter network intrusion.
- Machine Learning for Hackers: Case Studies and Algorithms to Get You Started
- More Machine learning books
Videos
- Using Machine Learning to Support Information Security
- Defending Networks with Incomplete Information
- Applying Machine Learning to Network Security Monitoring
- Measuring the IQ of your Threat Intelligence Feeds
- Data-Driven Threat Intelligence: Metrics On Indicator Dissemination And Sharing
- Applied Machine Learning for Data Exfil and Other Fun Topics
- Secure Because Math: A Deep-Dive on ML-Based Monitoring
- Machine Duping 101: Pwning Deep Learning Systems
- Delta Zero, KingPhish3r – Weaponizing Data Science for Social Engineering
- Defeating Machine Learning What Your Security Vendor Is Not Telling You
- CrowdSource: Crowd Trained Machine Learning Model for Malware Capability Det
- Defeating Machine Learning: Systemic Deficiencies for Detecting Malware
- Packet Capture Village – Theodora Titonis – How Machine Learning Finds Malware
- Build an Antivirus in 5 Min – Fresh Machine Learning #7. A fun video to watch
- Hunting for Malware with Machine Learning
- Machine Learning for Threat Detection
- Machine Learning and the Cloud: Disrupting Threat Detection and Prevention
- Fraud detection using machine learning & deep learning
- The Applications Of Deep Learning On Traffic Identification
- Defending Networks With Incomplete Information: A Machine Learning Approach
- Machine Learning & Data Science
Tutorials
- Big Data and Data Science for Security and Fraud Detection
- review of big data analytics tools and technologies that combine text mining, machine learning and network analysis for security threat prediction, detection and prevention at an early stage
- Using deep learning to break a Captcha system
- Data mining for network security and intrusion detection
Courses
Miscellaneous
- System predicts 85 percent of cyber-attacks using input from human experts
- A list of open source projects in cyber security using machine learning
Machine Learning Will Not Replace Other Cybersecurity Methods
5 Reasons Why Machine Learning Will Not Replace Other Cybersecurity Methods and Real-Life Examples of Effective ML for Data Protection
Where and How Machine Learning Is Used in Cybersecurity: 5 Practical Cases
It is believed that today it is banks, first of all, that are the largest users and drivers of the development of Big Data technologies and machine learning in the field of cybersecurity. For example, here we wrote how Machine Learning helps Home Credit Bank’s IT specialists monitor the operation of banking systems and timely identify abnormal activity of individual components or users. Machine Learning (ML) methods are also actively used by other high-tech companies in the development of special software.
In particular, the history of the creation of a secure Sqrrl DBMS, a graphical NoSQL database based on Apache Accumulo, is interesting. This cyberthreat search platform uses machine learning to visualize the vulnerabilities of computer networks. In January 2018 the corporation Amazon acquired Sqrrl for its cloud Amazon Web Services business.
Demisto , a company promoting a Security Orc hestration, Automation and Response (SOAR ) approach to cybersecurity, uses ML algorithms in its platform’s visual dashboard to prioritize threat messages.
It is also worth noting the experience of the domestic IT company Kaspersky Lab , which actively integrates machine learning models into its anti-virus products. To reduce the number of false positives, improve the interpretability of results, and increase the software resistance to actions of a potential attacker, Kaspersky Lab uses decision trees, locally stable convolutions, behavioral models, and ML clustering algorithms .
Likewise, Microsoft has created its own cybersecurity Windows Advanced Threat Protection system for proactive protection, breach detection, automatic investigation, and threat response. This product is integrated into all Windows 10-based devices and is actively used together with the company’s cloud services. Also, the ML system built into Windows Defender conducts behavioral analysis of a lot of data every day to prevent a possible attack. For example, when installing a malicious cryptominer into a browser at the level of an individual Windows user, the system recognizes and blocks this threat in just a few milliseconds. A similar threat at the enterprise level will be reflected in a couple of seconds thanks to the effective use of methods machine Learning.
Will Machine Learning Launch a Revolution in Cybersecurity and Why
Despite optimistic forecasts that Machine Learning will soon replace all living information security specialists with its automatic algorithms, in reality, it is still too early to talk about this. The following reasons prevent the complete abandonment of previous cybersecurity methods in favor of machine learning:
- Neural network models behave like a “black box”, a “thing in itself”, which does not explain why this particular result was obtained from such input data. This lack of direct cognitive feedback makes it impossible to completely abandon human control in important areas such as information security, similar to having manual control of an airplane, even with a very smart autopilot.
- The lack of a sufficient number of datasets for the correct training of ML-models in all areas of cyber threats, from computer viruses to social engineering techniques;
- the possibility of specific attacks on ML algorithms and the datasets used, which can lead to wrong decisions, missed attacks or false positives;
- Attackers also use Machine Learning algorithms to create malware, analyze user behavior, develop bots that collect personal data, search for vulnerabilities, guess passwords, spoof identity, bypass security systems, etc.
It is also worth noting some conflict between the requirements of the General Data Protection Regulation (GDPR) on the protection of personal data of citizens and residents of the European Union and the use of this information in ML-models of cybersecurity. In particular, the GDPR assumes that the user has the opportunity to “be forgotten” if he does not consent to the collection of his personal data or decides to withdraw it. This requirement may be violated if some ML-model automatically analyzes user behavior (cookies, data about the device, browser, etc.) to prevent threats, without explicitly informing the client about it. For more information on what GDPR is and how it relates to personal data, we talked here.
Thus, while machine learning cannot replace the previously existing cybersecurity methods, but it significantly complements and expands them. In particular, ML models improve the accuracy of signature analysis, which processes queries quickly and does not require a long training period. Thus, you can use signature analysis to identify queries with clear signs of an attack, and machine learning to analyze the rest of the queries. As a result of such a combination of different methods, high speed of anti-virus software is achieved with a minimum number of false positives and missed attacks.
Machine learning does not supplant old cybersecurity methods, but complements them