DPA3 (experimental)

This is an interface to the DPA3 architecture described in https://arxiv.org/abs/2506.01686 and implemented in deepmd-kit (https://github.com/deepmodeling/deepmd-kit).

Installation

To install this architecture along with the metatrain package, run:

pip install metatrain[dpa3]

where the square brackets indicate that you want to install the optional dependencies required for dpa3.

Default Hyperparameters

The description of all the hyperparameters used in dpa3 is provided further down this page. However, here we provide you with a yaml file containing all the default hyperparameters, which might be convenient as a starting point to create your own hyperparameter files:

architecture:
  name: experimental.dpa3
  model:
    type_map:
    - H
    - C
    - 'N'
    - O
    descriptor:
      type: dpa3
      repflow:
        n_dim: 128
        e_dim: 64
        a_dim: 32
        nlayers: 6
        e_rcut: 6.0
        e_rcut_smth: 5.3
        e_sel: 1200
        a_rcut: 4.0
        a_rcut_smth: 3.5
        a_sel: 300
        axis_neuron: 4
        skip_stat: true
        a_compress_rate: 1
        a_compress_e_rate: 2
        a_compress_use_split: true
        update_angle: true
        update_style: res_residual
        update_residual: 0.1
        update_residual_init: const
        smooth_edge_update: true
        use_dynamic_sel: true
        sel_reduce_factor: 10.0
      activation_function: custom_silu:10.0
      use_tebd_bias: false
      precision: float32
      concat_output_tebd: false
    fitting_net:
      neuron:
      - 240
      - 240
      - 240
      resnet_dt: true
      seed: 1
      precision: float32
      activation_function: custom_silu:10.0
      type: ener
      numb_fparam: 0
      numb_aparam: 0
      dim_case_embd: 0
      trainable: true
      rcond: null
      atom_ener: []
      use_aparam_as_mask: false
  training:
    distributed: false
    distributed_port: 39591
    batch_size: 8
    num_epochs: 100
    learning_rate: 0.001
    scheduler_patience: 100
    scheduler_factor: 0.8
    log_interval: 1
    checkpoint_interval: 100
    scale_targets: true
    fixed_composition_weights: {}
    per_structure_targets: []
    log_mae: false
    log_separate_blocks: false
    best_model_metric: rmse_prod
    loss: mse

Model hyperparameters

The parameters that go under the architecture.model section of the config file are the following:

ModelHypers.type_map: list[str] = ['H', 'C', 'N', 'O']
ModelHypers.descriptor: DescriptorHypers = {'activation_function': 'custom_silu:10.0', 'concat_output_tebd': False, 'precision': 'float32', 'repflow': {'a_compress_e_rate': 2, 'a_compress_rate': 1, 'a_compress_use_split': True, 'a_dim': 32, 'a_rcut': 4.0, 'a_rcut_smth': 3.5, 'a_sel': 300, 'axis_neuron': 4, 'e_dim': 64, 'e_rcut': 6.0, 'e_rcut_smth': 5.3, 'e_sel': 1200, 'n_dim': 128, 'nlayers': 6, 'sel_reduce_factor': 10.0, 'skip_stat': True, 'smooth_edge_update': True, 'update_angle': True, 'update_residual': 0.1, 'update_residual_init': 'const', 'update_style': 'res_residual', 'use_dynamic_sel': True}, 'type': 'dpa3', 'use_tebd_bias': False}
ModelHypers.fitting_net: FittingNetHypers = {'activation_function': 'custom_silu:10.0', 'atom_ener': [], 'dim_case_embd': 0, 'neuron': [240, 240, 240], 'numb_aparam': 0, 'numb_fparam': 0, 'precision': 'float32', 'rcond': None, 'resnet_dt': True, 'seed': 1, 'trainable': True, 'type': 'ener', 'use_aparam_as_mask': False}

Trainer hyperparameters

The parameters that go under the architecture.trainer section of the config file are the following:

TrainerHypers.distributed: bool = False

Whether to use distributed training

TrainerHypers.distributed_port: int = 39591

Port for DDP communication

TrainerHypers.batch_size: int = 8

The number of samples to use in each batch of training. This hyperparameter controls the tradeoff between training speed and memory usage. In general, larger batch sizes will lead to faster training, but might require more memory.

TrainerHypers.num_epochs: int = 100

Number of epochs.

TrainerHypers.learning_rate: float = 0.001

Learning rate.

TrainerHypers.scheduler_patience: int = 100
TrainerHypers.scheduler_factor: float = 0.8
TrainerHypers.log_interval: int = 1

Interval to log metrics.

TrainerHypers.checkpoint_interval: int = 100

Interval to save checkpoints.

TrainerHypers.scale_targets: bool = True

Normalize targets to unit std during training.

TrainerHypers.fixed_composition_weights: dict[str, float | dict[int, float]] = {}

Weights for atomic contributions.

This is passed to the fixed_weights argument of CompositionModel.train_model, see its documentation to understand exactly what to pass here.

TrainerHypers.per_structure_targets: list[str] = []

Targets to calculate per-structure losses.

TrainerHypers.log_mae: bool = False

Log MAE alongside RMSE

TrainerHypers.log_separate_blocks: bool = False

Log per-block error.

TrainerHypers.best_model_metric: Literal['rmse_prod', 'mae_prod', 'loss'] = 'rmse_prod'

Metric used to select best checkpoint (e.g., rmse_prod)

TrainerHypers.loss: str | dict[str, LossSpecification] = 'mse'

This section describes the loss function to be used. See the Loss functions for more details.

References