.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "generated_examples/1-advanced/04-flashmd.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_generated_examples_1-advanced_04-flashmd.py: Training or fine-tuning a FlashMD model ======================================= This tutorial demonstrates how to train or fine-tune a FlashMD model for the direct prediction of molecular dynamics. This type of model affords faster MD simulations compared to MLIPs by one or two orders of magnitude (https://arxiv.org/abs/2505.19350). .. GENERATED FROM PYTHON SOURCE LINES 12-27 .. code-block:: Python import copy import subprocess import ase import ase.build import ase.io import ase.units from ase.calculators.emt import EMT from ase.md import VelocityVerlet from ase.md.langevin import Langevin from ase.md.velocitydistribution import MaxwellBoltzmannDistribution from huggingface_hub import hf_hub_download .. GENERATED FROM PYTHON SOURCE LINES 28-36 Data generation --------------- FlashMD models train on molecular dynamics trajectories in the NVE ensemble (i.e., most often with the velocity Verlet integrator). These trajectories can be generated with almost any MD code (i-PI, LAMMPS, etc.). Here, for simplicity, we will use ASE and its built-in EMT potential. In reality, you might want to use a more accurate baseline such as ab initio MD or a machine-learned interatomic potential (MLIP). .. GENERATED FROM PYTHON SOURCE LINES 37-62 .. code-block:: Python # We start by creating a simple system (a small box of aluminum). atoms = ase.build.bulk("Al", "fcc", cubic=True) * (2, 2, 2) # We first equilibrate the system at 300K using a Langevin thermostat. MaxwellBoltzmannDistribution(atoms, temperature_K=300) atoms.calc = EMT() dyn = Langevin( atoms, 2 * ase.units.fs, temperature_K=300, friction=1 / (100 * ase.units.fs) ) dyn.run(1000) # 2 ps equilibration (around 10 ps is better in practice) # Then, we run a production simulation in the NVE ensemble. trajectory = [] def store_trajectory(): trajectory.append(copy.deepcopy(atoms)) dyn = VelocityVerlet(atoms, 1 * ase.units.fs) dyn.attach(store_trajectory, interval=1) dyn.run(2000) # 2 ps NVE run .. GENERATED FROM PYTHON SOURCE LINES 63-69 Data preparation ---------------- Now, we need to generate the training data from the trajectory. FlashMD models require future positions and momenta as targets. We will save them in an `.xyz` file under the `future_positions` and `future_momenta` keys. .. GENERATED FROM PYTHON SOURCE LINES 70-110 .. code-block:: Python # The FlashMD model will be trained to predict 32 steps into the future, i.e., 32 fs # since we ran the reference simulation with a time step of 1 fs. For this type # of system, FlashMD is expected to perform well up to around 60-80 fs. time_lag = 32 # We pick starting structures that are 200 steps apart. To avoid wasting training # structures, this should be set to be around the expected velocity-velocity # autocorrelation time for the system. This is the time scale that quantifies how long # it takes for the system to forget its original velocities. spacing = 200 def get_structure_for_dataset(frame_now, frame_ahead): s = copy.deepcopy(frame_now) s.arrays["future_positions"] = frame_ahead.get_positions() s.arrays["future_momenta"] = frame_ahead.get_momenta() return s structures_for_dataset = [] for i in range(0, len(trajectory) - time_lag, spacing): frame_now = trajectory[i] frame_ahead = trajectory[i + time_lag] s = get_structure_for_dataset(frame_now, frame_ahead) structures_for_dataset.append(s) # Here, we also add the time-reversed pair (optional). This is # generally a good idea because it is data we get for "free", as # the underlying dynamics are time-reversible. frame_now_trev = copy.deepcopy(frame_now) frame_ahead_trev = copy.deepcopy(frame_ahead) frame_now_trev.set_momenta(-frame_now_trev.get_momenta()) frame_ahead_trev.set_momenta(-frame_ahead_trev.get_momenta()) s = get_structure_for_dataset(frame_ahead_trev, frame_now_trev) structures_for_dataset.append(s) # Write the structures to an xyz file ase.io.write("flashmd.xyz", structures_for_dataset) .. GENERATED FROM PYTHON SOURCE LINES 111-121 Training the model ------------------ The dataset is now ready for training. You can now provide it to ``metatrain`` and train your FlashMD model! For example, you can use the following options file: .. literalinclude:: options-flashmd.yaml :language: yaml .. GENERATED FROM PYTHON SOURCE LINES 122-127 .. code-block:: Python # Here, we run training as a subprocess, in reality you would run this from the command # line as ``mtt train options-flashmd.yaml``. subprocess.run(["mtt", "train", "options-flashmd.yaml"], check=True) .. rst-class:: sphx-glr-script-out .. code-block:: none CompletedProcess(args=['mtt', 'train', 'options-flashmd.yaml'], returncode=0) .. GENERATED FROM PYTHON SOURCE LINES 128-139 Fine-tuning a universal pre-trained FlashMD model ------------------------------------------------- Fine-tuning is generally recommended over training from scratch, as it drastically reduces the amount of training data and training time required to obtain good models. Here, we demonstrate how to fine-tune a pre-trained FlashMD model on our aluminum dataset. You just need to add the ``finetune`` section to the training options file, specifying the checkpoint file of the pre-trained model to start from. .. literalinclude:: options-flashmd-finetune.yaml :language: yaml .. GENERATED FROM PYTHON SOURCE LINES 140-160 .. code-block:: Python # If you have the flashmd package installed, you can simply get the pre-trained # checkpoint file in your current directory by running: # # from flashmd import save_checkpoint # save_checkpoint("pet-omatpes-v2", time_lag) # Here, we instead download the pre-trained model checkpoint using the HuggingFace # library. In any case, make sure the time lag of the pre-trained model matches the one # you used to generate your dataset! file_path = hf_hub_download( repo_id="lab-cosmo/flashmd", filename=f"flashmd_pet-omatpes-v2_{time_lag}fs.ckpt", local_dir=".", local_dir_use_symlinks=False, ) # In this script, we run training as a subprocess; in reality you would run this from # the command line as ``mtt train options-flashmd-finetune.yaml``. subprocess.run(["mtt", "train", "options-flashmd-finetune.yaml"], check=True) .. rst-class:: sphx-glr-script-out .. code-block:: none /home/docs/checkouts/readthedocs.org/user_builds/metatrain/envs/725/lib/python3.13/site-packages/huggingface_hub/utils/_validators.py:202: UserWarning: The `local_dir_use_symlinks` argument is deprecated and ignored in `hf_hub_download`. Downloading to a local directory does not use symlinks anymore. warnings.warn( CompletedProcess(args=['mtt', 'train', 'options-flashmd-finetune.yaml'], returncode=0) .. rst-class:: sphx-glr-timing **Total running time of the script:** (2 minutes 29.822 seconds) .. _sphx_glr_download_generated_examples_1-advanced_04-flashmd.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 04-flashmd.ipynb <04-flashmd.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 04-flashmd.py <04-flashmd.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 04-flashmd.zip <04-flashmd.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_