We release the largest StarCraft: Brood War replay dataset yet, with
65646 games. The full dataset after compression is 365 GB, 1535 million
frames, and 496 million player actions. The entire frame data was dumped
out at 8 frames per second. We made a big effort to ensure this dataset
is clean and has mostly high quality replays. You can access it with
TorchCraft in C++, Python, and Lua. The replays are in an AWS S3 bucket
at s3://stardata. Read below for more details, or our whitepaper on arXiv for more details.
Installing TorchCraft
Note: The current set of replays are only compatible with the 1.3.0 version of torchcraft included here.
Simply do
git submodule update --init
cd TorchCraft
pip install .
More documentation can be found at https://github.com/TorchCraft/TorchCraft.
Realistically, you will only need the replayer modules, which means you
can ignore most of the connecting to starcraft parts. Check out the
code to document its use
Standardized train,valid, andtest sets are also available.Here is a list of all the files.
Reproducing Results
Some of the reproduction scripts are included, others scripts will be added as
soon as we clean up the code and make it easy to install/run. Simply make and
you're good to go. All cpp files can be run like script /path/to/replays/**/*.rep
extract_stats tells you some stats about the replays
extract_units preprocesses for battle clustering
get_corrupt_replays tells you what replays are considered corrupt
cluster.py can be run on the output of extract_units to do battle clustering.
Attributions
The white paper for the dataset is at:
Lin, Z., G., Jonas, K., Vasil, Synnaeve, G., AIIDE 2017. STARDATA: A StarCraft AI Research Dataset (arxiv)