VDiscover is a tool designed to train a vulnerability detection predictor.
Given a vulnerability discovery procedure and a large enough number of training testcases, it extracts lightweight
features to predict which testcases are potentially vulnerable. This
repository contains an improved version of a proof-of-concept used to
show experimental results in our technical report (available here).
Use cases
VDiscover aims to be used when there is a large amount of testcases to analyze using a costly
vulnerability detection procedure. It can be trained to provide a quick
prioritization of testcases. The extraction of features to perform a
prediction is designed to be scalable. Nevertheless, this implementation
is not particularly optimized so it should easy to improve the
performance of it.
matplotlib for visualization (very experimental, optional)
keras for convolutional clustering (very experimental, optional)
Trace extraction is working only in x86 (x86_64 support should be simple to extend and it is planned)
Quickstart
Before starting, it is recommended to manually install binutils,
scikit-learn and setuptools (to perform a local installation). For
instance, in Ubuntu/Debian:
git clone https://github.com/CIFASIS/VDiscover.git
cd VDiscover
python setup.py install --user
By default, the local installation of the command line utilities of
VDiscover is performed inside ~/.local/bin, so it is recommended to add
this directory into the PATH variable. Our tool is composed by two main
components:
fextractor: to extract dynamic and static features from test cases.
vpredictor: to train a new vulnerability prediction
model or predict using a previously trained one. It can be used to
cluster and visualize a set of test cases.
Some examples of testcases of very popular programs (grep, gzip, bc, ..) can be found in examples/testcases. For example, to extract raw dynamic features from an execution of bc:
This raw data can be used to train a new vulnerability prediction
model or predict using a previously trained one. Additionally, more
detailed (but outdated) documentation is available here.