Credit: Paruzzo F M, Hofstetter A, Musil F, et al. Chemical shifts in molecular solids by machine learning. Nature communications, 2018, 9(1): 4501.
Then the data are transformed to .xyz files into different folders
python get_data.py
Step 2: Transform the xyz files into json files
Transform the .xyz files to .json files with k nearest atoms into destination folder
python xyz_to_json.py
Step 3: Transform json files into numpy files
The xyz values, atom type and chemical shielding values are transfomed into into:
prefix_xyz.npy, prefix_points.npy and prefix_y.npy
python json_to_numpy.py
Step 4: Data augmentation
Augment xyz files with 8 fold
python data_aug.py
Step 5: Density generation
Generate the density given xyz and one-hot atom type vector numpy files
python density_gen.py
Note: for the densities other than Gaussian, we simply copy our raw script into the current script without further testing.
Models
We provide the following models in model directory:
MR-3D-DenseNet
Two baseline DenseNets
Regular CNN and ResNet with same number of 3x3x3 filters
Note: same as other densities, we only tested 1). 2) and 3) are not extensively tested in the current script.
Training and Testing script
We also provide the trainin gand testing script examples.
Under the default setting, please run following command sequentially:
python train.py
python test.py
IMPORTANT: these two only provide examples for oxygen. For other atom types, the file names and scale need to be changed. Also, for hydrogen, there is a filtering process to filter out the chemical shieldings < 0 or > 40 in the training dataset. All the other dataset are not filtered.
Tips: In the current setting, the atom type order is H, C, O, N and the grid size order (center to face distance): 2A, 4A, 3A, 5A, 7A. This indexing was due to that we were using 3A, 5A, 7A as a subset for ablation study. This order makes the indexing more easily. However, one can easily change the order in preprocessing/json_to_numpy.py and train/test.py, respectively.