LoQI: Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy

Filipp Nikitin^1,2 · Dylan M. Anstine^2,3 · Roman Zubatyuk^2,5 · Saee Gopal Paliwal⁵ · Olexandr Isayev^1,2,4*
¹Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, USA
²Department of Chemistry, Carnegie Mellon University, Pittsburgh, PA, USA
³Department of Chemical Engineering and Materials Science, Michigan State University, East Lansing, MI, USA
⁴Department of Materials Science and Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
⁵NVIDIA, Santa Clara, CA, USA

📄 Paper · 📖 Citation · ⚙️ Setup · 🔗 GitHub

^*Corresponding author: olexandr@olexandrisayev.com

Overview

Abstract

Molecular geometry is crucial for biological activity and chemical reactivity; however, computational methods for generating 3D structures are limited by the vast scale of conformational space and the complexities of stereochemistry. Here we present an approach that combines an expansive dataset of molecular conformers with generative diffusion models to address this problem. We introduce ChEMBL3D, which contains over 250 million molecular geometries for 1.8 million drug-like compounds, optimized using AIMNet2 neural network potentials to a near-quantum mechanical accuracy with implicit solvent effects included. This dataset captures complex organic molecules in various protonation states and stereochemical configurations.

We then developed LoQI (Low-energy QM Informed conformer generative model), a stereochemistry-aware diffusion model that learns molecular geometry distributions directly from this data. Through graph augmentation, LoQI accurately generates molecular structures with targeted stereochemistry, representing a significant advance in modeling capabilities over previous generative methods. The model outperforms traditional approaches, achieving up to tenfold improvement in energy accuracy and effective recovery of optimal conformations. Benchmark tests on complex systems, including macrocycles and flexible molecules, as well as validation with crystal structures, show LoQI can perform low energy conformer search efficiently.

Note on Implementation: LoQI is built upon the Megalodon architecture developed, adapting it specifically for stereochemistry-aware conformer generation with the ChEMBL3D dataset.

Key Features

ChEMBL3D Dataset: 250+ million AIMNet2-optimized conformers for 1.8M drug-like molecules
Stereochemistry-Aware: First all-atom diffusion model with explicit stereochemical encoding
Quantum Mechanical Accuracy: Near-DFT accuracy with implicit solvent effects
Superior Performance: Up to 10x improvement in energy accuracy over traditional methods
Complex Molecule Support: Handles macrocycles, flexible molecules, and challenging stereochemistry

Setup

Installation will usually take up to 20 minutes.

System and Hardware Requirements

OS tested by authors:
- Ubuntu 24.04 LTS (latest stable Ubuntu LTS at time of writing)
Other platforms:
- Expected to work, but if installation is not out-of-the-box, use the PyTorch Geometric installation guide for your exact Python/PyTorch/CUDA combination: https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html
Tested inference hardware:
- GPU: NVIDIA RTX 3090 (24 GB VRAM)
- CPU: AMD Ryzen 9 5950X
Recommended GPU memory:
- 16-24 GB VRAM for comfortable inference/evaluation with larger molecules and higher batch sizes
Minimum practical GPU memory:
- 8 GB VRAM can run inference, but requires reduced batch sizes
CPU-only:
- Possible, but not recommended and not systematically studied by the authors

OOM mitigation for larger molecules:

reduce inference batch size (--batch_size in sampling, or data.inference_batch_size in config)
if using evaluation/optimization, also reduce optimization batch size (evaluation.energy_metrics_args.batchsize)

Prerequisites

Python 3.10+
CUDA-compatible GPU (recommended for training)
Conda or Mamba (recommended)

Environment Setup

# Clone the repository
git clone https://github.com/isayevlab/LoQI.git
cd LoQI

# Create and activate conda environment
conda create -n loqi python=3.10 -y
conda activate loqi

# Install core dependencies
pip install -r requirements.txt

# Install this package in editable mode (adds src to PYTHONPATH)
pip install -e .

If you prefer a fully conda-based setup (recommended for RDKit), you can install RDKit via conda-forge before running pip install -r requirements.txt.

Data Setup

Training and evaluation use the ChEMBL3D data releases below.

Release 1: Full ChEMBL3D Quantum-Accurate conformer dataset

Release 2: Processed dataset + LoQI checkpoints (diffusion + flow matching)

URL: https://kilthub.cmu.edu/articles/dataset/LoQI_Scalable_Low-Energy_Molecular_Conformer_Generation_with_Quantum_Mechanical_Accuracy/31441570
DOI: https://doi.org/10.1184/R1/31441570
Includes:
- loqi.ckpt
- loqi_flow.ckpt
- chembl3d_stereo/ processed dataset

For this repository, place downloaded assets with this layout:

LoQI/
  data/
    loqi.ckpt
    loqi_flow.ckpt
    chembl3d_stereo/
      processed/
        ...

AimNet2 model path expected by configs:

src/megalodon/metrics/aimnet2/cpcm_model/wb97m_cpcms_v2_0.jpt

Web App

The repository includes a Streamlit interface for interactive conformer generation, postprocessing, and visualization.

Use the app-specific installation and usage instructions from app/README.md (recommended, as app dependencies are separated from core training/inference dependencies).
Quick start from repo root:

pip install -r app/requirements.txt
streamlit run app/app.py

Usage

Make sure that src content is available in your PYTHONPATH (e.g., export PYTHONPATH="./src:$PYTHONPATH") if LoQI is not installed locally (pip install -e .).

Model Training

# LoQI conformer generation model
python scripts/train.py --config-name=loqi outdir=./outputs train.gpus=1 data.dataset_root="./chembl3d_data"

# LoQI flow-matching conformer generation model
python scripts/train.py --config-name=loqi_flow outdir=./outputs train.gpus=1 data.dataset_root="data/chembl3d_stereo"

# Customize training parameters
python scripts/train.py --config-name=loqi \
    outdir=./outputs \
    train.gpus=2 \
    train.n_epochs=800 \
    train.seed=42 \
    data.batch_size=150 \
    optimizer.lr=0.0001

Model Inference and Sampling

Conformer Generation

# Generate conformers for a single molecule
python scripts/sample_conformers.py \
    --config scripts/conf/loqi/loqi.yaml \
    --ckpt data/loqi.ckpt \
    --input "c1ccccc1" \
    --output outputs/benzene_conformers.sdf \
    --n_confs 10 \
    --batch_size 1

# Generate conformers with evaluation (requires 3D input, e.g., SDF with low energy conformer)
python scripts/sample_conformers.py \
    --config scripts/conf/loqi/loqi.yaml \
    --ckpt data/loqi.ckpt \
    --input data/ethanot_low_energy.sdf \
    --output outputs/ethanol_conformers.sdf \
    --n_confs 100 \
    --batch_size 10 \
    --eval

# Optional postprocessing: AIMNet2 optimization + iRMSD unique-set pruning
python scripts/sample_conformers.py \
    --config scripts/conf/loqi/loqi_flow.yaml \
    --ckpt data/loqi_flow.ckpt \
    --input "CC(=O)Oc1ccccc1C(=O)O" \
    --output outputs/aspirin_opt_unique.sdf \
    --n_confs 50 \
    --batch_size 50 \
    --postprocess optimization+irmsd \
    --optimization_batch_size 64 \
    --opt_fmax 0.05 \
    --opt_max_nstep 250 \
    --irmsd_rthr 0.125

Recent sampling updates in scripts/sample_conformers.py:

input validation + SMILES revalidation (canonical roundtrip), with unsupported-element/radical checks
atom-aware dynamic batching for inference (--atom-aware-batching, --target-molecule-size, --shuffle)
optional hydrogen addition for SMILES inputs (--add-hs / --no-add-hs)
no RDKit conformer initialization for SMILES; zero-initialized coordinates are used
if input is SDF with conformers, existing 3D coordinates are used
optional postprocessing (--postprocess none|optimization|optimization+irmsd)

On the tested setup (RTX 3090 + Ryzen 9 5950X), inference for a typical ChEMBL molecule takes approximately 0.1 seconds per conformer when processed within a batch. See System and Hardware Requirements above for VRAM guidance and OOM mitigation.

Note: Make sure you define correct paths for dataset and AimNet2 model in loqi.yaml. The relative path of AimNet2 model is src/megalodon/metrics/aimnet2/cpcm_model/wb97m_cpcms_v2_0.jpt.

Sampling steps: --n_steps defaults to 25. Diffusion models were trained with 25 steps and are not expected to work well for other values. Flow-matching models can be run with different step counts.

Performance Test (Fixed Molecule Sizes)

Use scripts/performance_test.py to:

sample 1000 molecules each with atom counts 10, 25, 50, and 100 from data/chembl3d_stereo/processed/train_h.pt
select molecules deterministically (first N per size in dataset order)
export per-molecule SDF inputs
measure per-molecule generation and optimization times

conda run -n mega env PYTHONPATH=./src TORCH_COMPILE_DISABLE=1 \
python scripts/performance_test.py \
  --dataset_pt data/chembl3d_stereo/processed/train_h.pt \
  --sizes 10,25,50,100 \
  --n_per_size 100 \
  --outdir outputs/performance_test \
  --config scripts/conf/loqi/loqi.yaml \
  --ckpt data/loqi.ckpt \
  --n_confs 100 \
  --generation_batch_size 1

By default, optimization settings are taken from the selected config (evaluation.energy_metrics_args.batchsize and evaluation.energy_metrics_args.opt_params).

Outputs:

outputs/performance_test/selected_manifest.csv (selected molecules + per-molecule SDF path)
outputs/performance_test/size_<N>/mol_*.sdf (one input SDF per selected molecule)
outputs/performance_test/size_<N>_selected.sdf (combined SDF per size)
outputs/performance_test/timings_per_molecule.csv (generation/optimization timing per molecule)

Available Configurations

LoQI Models:

loqi.yaml - LoQI stereochemistry-aware conformer generation model
nextmol.yaml - Alternative configuration for NextMol-style generation
loqi_flow.yaml - LoQI flow-matching conformer generation model

Citation

If you use LoQI in your research, please cite our paper:

@article{nikitin2025scalable,
  title={Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy},
  author={Nikitin, Filipp and Anstine, Dylan M and Zubatyuk, Roman and Paliwal, Saee Gopal and Isayev, Olexandr},
  year={2025}
}

This work builds upon the Megalodon architecture. If you use the underlying architecture, please also cite:

@article{reidenbach2025applications,
  title={Applications of Modular Co-Design for De Novo 3D Molecule Generation},
  author={Reidenbach, Danny and Nikitin, Filipp and Isayev, Olexandr and Paliwal, Saee},
  journal={arXiv preprint arXiv:2505.18392},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
app		app
assets		assets
data_processing		data_processing
megalodon_licence		megalodon_licence
scripts		scripts
src/megalodon		src/megalodon
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoQI: Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy

Overview

Abstract

Key Features

Setup

System and Hardware Requirements

Prerequisites

Environment Setup

Data Setup

Web App

Usage

Model Training

Model Inference and Sampling

Conformer Generation

Performance Test (Fixed Molecule Sizes)

Available Configurations

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LoQI: Scalable Low-Energy Molecular Conformer Generation with Quantum Mechanical Accuracy

Overview

Abstract

Key Features

Setup

System and Hardware Requirements

Prerequisites

Environment Setup

Data Setup

Web App

Usage

Model Training

Model Inference and Sampling

Conformer Generation

Performance Test (Fixed Molecule Sizes)

Available Configurations

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages