DPA3 (experimental)¶
This is an interface to the DPA3 (Deep Potential Attention 3) architecture [1] implemented in deepmd-kit.
DPA3 extends the DPA series with a Line Graph representation and the RepFlow framework, enabling richer many-body interactions through joint edge-angle message passing. See the paper and the deepmd-kit documentation for further details.
Note
The type_map required by deepmd-kit is derived automatically from the
atomic numbers present in the dataset; it is not a user-facing
hyperparameter.
Installation¶
To install this architecture along with the metatrain package, run:
pip install metatrain[dpa3]
where the square brackets indicate that you want to install the optional
dependencies required for dpa3.
Default Hyperparameters¶
The description of all the hyperparameters used in dpa3 is provided
further down this page. However, here we provide you with a yaml file containing all
the default hyperparameters, which might be convenient as a starting point to
create your own hyperparameter files:
architecture:
name: experimental.dpa3
model:
dpa3_model: null
descriptor:
type: dpa3
repflow:
n_dim: 128
e_dim: 64
a_dim: 32
nlayers: 6
e_rcut: 6.0
e_rcut_smth: 5.3
e_sel: 1200
a_rcut: 4.0
a_rcut_smth: 3.5
a_sel: 300
axis_neuron: 4
skip_stat: true
a_compress_rate: 1
a_compress_e_rate: 2
a_compress_use_split: true
update_angle: true
update_style: res_residual
update_residual: 0.1
update_residual_init: const
smooth_edge_update: true
use_dynamic_sel: true
sel_reduce_factor: 10.0
activation_function: custom_silu:10.0
use_tebd_bias: false
precision: ${base_precision}
concat_output_tebd: false
fitting_net:
neuron:
- 240
- 240
- 240
resnet_dt: true
seed: 1
precision: ${base_precision}
activation_function: custom_silu:10.0
type: ener
numb_fparam: 0
numb_aparam: 0
dim_case_embd: 0
trainable: true
rcond: null
atom_ener: []
use_aparam_as_mask: false
training:
distributed: false
distributed_port: 39591
batch_size: 8
num_epochs: 100
learning_rate: 0.001
scheduler_patience: 100
scheduler_factor: 0.8
log_interval: 1
checkpoint_interval: 100
scale_targets: true
fixed_composition_weights: {}
per_structure_targets: []
log_mae: false
log_separate_blocks: false
best_model_metric: rmse_prod
loss: mse
Tuning hyperparameters¶
The most impactful hyperparameters (roughly in decreasing order of importance):
- ModelHypers.descriptor: DescriptorHypers = {'activation_function': 'custom_silu:10.0', 'concat_output_tebd': False, 'precision': '${base_precision}', 'repflow': {'a_compress_e_rate': 2, 'a_compress_rate': 1, 'a_compress_use_split': True, 'a_dim': 32, 'a_rcut': 4.0, 'a_rcut_smth': 3.5, 'a_sel': 300, 'axis_neuron': 4, 'e_dim': 64, 'e_rcut': 6.0, 'e_rcut_smth': 5.3, 'e_sel': 1200, 'n_dim': 128, 'nlayers': 6, 'sel_reduce_factor': 10.0, 'skip_stat': True, 'smooth_edge_update': True, 'update_angle': True, 'update_residual': 0.1, 'update_residual_init': 'const', 'update_style': 'res_residual', 'use_dynamic_sel': True}, 'type': 'dpa3', 'use_tebd_bias': False}
Descriptor configuration (RepFlow block and related settings).
- TrainerHypers.learning_rate: float = 0.001
Learning rate.
- TrainerHypers.batch_size: int = 8
The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.
Increasing descriptor.repflow.nlayers typically improves accuracy at the
cost of training time. descriptor.repflow.e_rcut controls the interaction
range and should be chosen based on the physical system. Reduce e_sel and
a_sel for faster iteration on small systems.
Using a pretrained model¶
Set dpa3_model to a deepmd-kit model file to fine-tune from pretrained
weights instead of training from scratch:
model:
dpa3_model: path/to/deepmd-model.pt
Energy biases and standard deviations are extracted from the loaded model and handed to metatrain’s composition model and scaler automatically.
References¶
Model hyperparameters¶
The parameters that go under the architecture.model section of the config file
are the following:
- ModelHypers.dpa3_model: str | None = None¶
Path to a pretrained DPA3 model file (deepmd-kit checkpoint or saved Module). When provided, the model weights are loaded from this file instead of being initialised from scratch. Energy biases and standard deviations stored in the deepmd-kit model are extracted and handed to metatrain’s
CompositionModelandScalerso that fine-tuning starts from the pretrained values.
- ModelHypers.descriptor: DescriptorHypers = {'activation_function': 'custom_silu:10.0', 'concat_output_tebd': False, 'precision': '${base_precision}', 'repflow': {'a_compress_e_rate': 2, 'a_compress_rate': 1, 'a_compress_use_split': True, 'a_dim': 32, 'a_rcut': 4.0, 'a_rcut_smth': 3.5, 'a_sel': 300, 'axis_neuron': 4, 'e_dim': 64, 'e_rcut': 6.0, 'e_rcut_smth': 5.3, 'e_sel': 1200, 'n_dim': 128, 'nlayers': 6, 'sel_reduce_factor': 10.0, 'skip_stat': True, 'smooth_edge_update': True, 'update_angle': True, 'update_residual': 0.1, 'update_residual_init': 'const', 'update_style': 'res_residual', 'use_dynamic_sel': True}, 'type': 'dpa3', 'use_tebd_bias': False}¶
Descriptor configuration (RepFlow block and related settings).
- ModelHypers.fitting_net: FittingNetHypers = {'activation_function': 'custom_silu:10.0', 'atom_ener': [], 'dim_case_embd': 0, 'neuron': [240, 240, 240], 'numb_aparam': 0, 'numb_fparam': 0, 'precision': '${base_precision}', 'rcond': None, 'resnet_dt': True, 'seed': 1, 'trainable': True, 'type': 'ener', 'use_aparam_as_mask': False}¶
Fitting network configuration.
Trainer hyperparameters¶
The parameters that go under the architecture.trainer section of the config file
are the following:
- TrainerHypers.batch_size: int = 8¶
The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.
- TrainerHypers.scheduler_patience: int = 100¶
Number of epochs with no improvement before reducing the learning rate.
- TrainerHypers.scheduler_factor: float = 0.8¶
Factor by which the learning rate is reduced on plateau.
- TrainerHypers.fixed_composition_weights: dict[str, float | dict[int, float]] = {}¶
Weights for atomic contributions.
This is passed to the
fixed_weightsargument ofCompositionModel.train_model, see its documentation to understand exactly what to pass here.
- TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'¶
Metric used to select best checkpoint (e.g.,
rmse_prod).
- TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'¶
This section describes the loss function to be used. See the Loss functions for more details.