Trajectory Export: Multiple Steps by dwhswenson · Pull Request #1176 · openpathsampling/openpathsampling

dwhswenson · 2025-12-14T21:06:44Z

This exports trajectories from a set of MC steps, using symlinks to prevent duplication. There will be snapshot duplication when two trajectories share snapshots (one-way-shooting) and coordinate duplication when two trajectories are reversed (path reversal), but full trajectories, which may not change across MC steps (e.g., shooting in only one ensemble leaves the others unchanged), are not duplicated.

The default file structure is, e.g.:

100/
├── active
│   ├── condensed MIS minus.pdb -> ../../raw_data/17290287612105036191526417646194015568.pdb
│   ├── condensed->extended 0.pdb -> ../../raw_data/17290287612105036191526417646194017830.pdb
│   ├── condensed->extended 1.pdb -> ../../raw_data/17290287612105036191526417646194017878.pdb
│   └── condensed->extended 2.pdb -> ../../raw_data/17290287612105036191526417646194017276.pdb
└── trials
    └── condensed->extended 1.pdb -> ../../raw_data/17290287612105036191526417646194018084.pdb
raw_data/
├── 17290287612105036191526417646194015568.pdb
├── 17290287612105036191526417646194017830.pdb
├── 17290287612105036191526417646194017878.pdb
├── 17290287612105036191526417646194017276.pdb
├── 17290287612105036191526417646194018084.pdb

where, in that example, 100 is the step number. The files in active and trials are named by the friendly name of the ensemble, and are symlinks to data in raw_data, named by the UUID.

I've also changed it so that tracking the extension is the job of the writer. This does mean that the writer is more complicated than just a bare function, but it feels to me like it makes more sense.

Dump steps as files, organized by raw (real files) and trial and active (symlinks). Works in principle as of this commit, but need to try it out in reality.

… into dump-trajectories

Copilot

Pull request overview

This PR implements trajectory export functionality for multiple Monte Carlo steps using symlinks to avoid data duplication. The key changes include adding file extension tracking to trajectory writers and creating a new SymLinkStepExporter class that exports raw trajectory data once and creates symlinks for trial and active samples.

Added ext property to trajectory writer classes to track file extensions
Created SymLinkStepExporter class for exporting MC steps with symlink-based deduplication
Updated tests to pass the required ext parameter to MDTrajTrajectoryWriter

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
openpathsampling/exports/trajectories/core.py	Added abstract `ext` property to `TrajectoryWriter` base class and concrete implementation in `SimStoreTrajectoryWriter`
openpathsampling/exports/trajectories/mdtrajtrajectorywriter.py	Made `ext` a required parameter and added `ext` property to return it
openpathsampling/exports/trajectories/trrtrajectorywriter.py	Added `ext` property returning "trr"
openpathsampling/exports/steps/symlink_step_exporter.py	New class implementing step export with symlink deduplication and helper function `export_steps`
openpathsampling/tests/exports/steps/test_symlink_step_exporter.py	Comprehensive test suite for `SymLinkStepExporter` covering various MC move types
openpathsampling/tests/exports/steps/conftest.py	Import fixture from analysis tests for use in step export tests
openpathsampling/tests/exports/trajectories/test_mdtrajtrajectorywriter.py	Updated test calls to pass required `ext="xtc"` parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-14T21:10:48Z

openpathsampling/exports/steps/symlink_step_exporter.py

+        ----------
+        step : Step
+           The step containing the sample.
+       sample : Sample


The docstring parameter has inconsistent indentation. Line 115 "sample : Sample" has 7 leading spaces instead of 8 spaces to match line 113 "step : Step". The indentation should be consistent for all parameters in the docstring.

Suggested change

sample : Sample

sample : Sample

Copilot · 2025-12-14T21:10:48Z

openpathsampling/exports/steps/symlink_step_exporter.py

+            engines = collections.Counter([s.engine for s in
+                                          sample.trajectory])


When self.writer is None and sample.trajectory is empty, this code will raise an IndexError on line 76 when trying to access engines.most_common(1)[0][0]. Consider adding validation to handle empty trajectories or documenting that trajectories must not be empty.

Suggested change

engines = collections.Counter([s.engine for s in

sample.trajectory])

if not sample.trajectory:

raise ValueError(

"Cannot determine writer: sample.trajectory is empty. "

"Please provide a writer explicitly or ensure the trajectory is not empty."

)

engines = collections.Counter([s.engine for s in sample.trajectory])

Copilot · 2025-12-14T21:10:49Z

openpathsampling/exports/trajectories/mdtrajtrajectorywriter.py

    for saving data that could be used in a restart.
    """
-    def __init__(self, mdtraj_selection=None):
+    def __init__(self, ext, mdtraj_selection=None):


The ext parameter in MDTrajTrajectoryWriter is now a required positional argument. Consider making it a keyword-only argument or providing a default value to maintain backward compatibility and improve API clarity. For example: def init(self, *, ext, mdtraj_selection=None) would make it clear that ext must be explicitly specified.

Suggested change

def __init__(self, ext, mdtraj_selection=None):

def __init__(self, *, ext, mdtraj_selection=None):

In the same vein; would this change break loading files with old writers? and should OPS deal with that?

Copilot · 2025-12-14T21:10:49Z

openpathsampling/exports/steps/symlink_step_exporter.py

+        base_dir='.',
+        # TODO: add in a base using storage handlers, default to cwd
+        raw_data_pattern=_DEFAULT_RAW_DATA_PATTERN,
+        trial_pattern=_DEFAULT_TRIAL_PATTERN,
+        active_pattern=_DEFAULT_ACTIVE_PATTERN,
+    ):
+        self.writer = writer
+        self.base_dir = pathlib.Path(base_dir)


The base_dir parameter is stored in init but never used anywhere in the class. Either this parameter should be used to construct absolute paths (e.g., combining it with the patterns), or it should be removed if it's not needed. This could be confusing for users who expect base_dir to affect where files are written.

Copilot · 2025-12-14T21:10:49Z

openpathsampling/tests/exports/steps/test_symlink_step_exporter.py

+
+from openpathsampling.exports.steps.symlink_step_exporter import (
+    SymLinkStepExporter,
+    _DEFAULT_TRIAL_PATTERN, _DEFAULT_ACTIVE_PATTERN,


Import of '_DEFAULT_TRIAL_PATTERN' is not used.
Import of '_DEFAULT_ACTIVE_PATTERN' is not used.

Suggested change

_DEFAULT_TRIAL_PATTERN, _DEFAULT_ACTIVE_PATTERN,

codecov · 2025-12-14T21:11:28Z

Codecov Report

❌ Patch coverage is 96.25000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.37%. Comparing base (72c3dd7) to head (4ec34e3).

Files with missing lines	Patch %	Lines
openpathsampling/exports/trajectories/core.py	80.00%	1 Missing ⚠️
...ing/exports/trajectories/mdtrajtrajectorywriter.py	80.00%	1 Missing ⚠️
...mpling/exports/trajectories/trrtrajectorywriter.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1176      +/-   ##
==========================================
+ Coverage   81.28%   81.37%   +0.08%     
==========================================
  Files         147      148       +1     
  Lines       15807    15886      +79     
==========================================
+ Hits        12849    12927      +78     
- Misses       2958     2959       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sroet

I am not into the code enough to do a proper review, but I did note some things that might behave wonky from what I could remember from OPS 😅

Other than the review comments; from the test coverage it seems like the ext property isn't actually used at the moment by the writers)

Feel free to of course ignore any/all of these comments, especially as I now see you didn't actually asked for review on this PR yet, but it felt like a waste to just remove the review comments...

sroet · 2025-12-15T09:09:14Z

openpathsampling/exports/steps/symlink_step_exporter.py

+        if sample.ensemble.is_named:
+            ensemble_id = sample.ensemble.name


Do we guarantee that ensemble names are unique and would that break the file storage? (sorry, it's been a while since I've looked at/used OPS)

sroet · 2025-12-15T09:11:05Z

openpathsampling/exports/trajectories/mdtrajtrajectorywriter.py

    for saving data that could be used in a restart.
    """
-    def __init__(self, mdtraj_selection=None):
+    def __init__(self, ext, mdtraj_selection=None):


In the same vein; would this change break loading files with old writers? and should OPS deal with that?

dwhswenson added 5 commits June 4, 2025 21:48

SymLinkStepExporter

89de616

Dump steps as files, organized by raw (real files) and trial and active (symlinks). Works in principle as of this commit, but need to try it out in reality.

fix incorrect approach to unnamed ensembles

9fc9d50

style

bc5d7ea

Merge branch 'master' of github.com:openpathsampling/openpathsampling…

cb4017d

… into dump-trajectories

Move the ext into the responsibility of the writer

84c984d

dwhswenson requested a review from Copilot December 14, 2025 21:06

dwhswenson added the feature label Dec 14, 2025

Copilot started reviewing on behalf of dwhswenson December 14, 2025 21:07 View session

Copilot AI reviewed Dec 14, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/master' into dump-trajectories

4ec34e3

sroet reviewed Dec 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trajectory Export: Multiple Steps#1176

Trajectory Export: Multiple Steps#1176
dwhswenson wants to merge 6 commits intoopenpathsampling:masterfrom
dwhswenson:dump-trajectories

dwhswenson commented Dec 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 14, 2025

Uh oh!

Copilot AI Dec 14, 2025

Uh oh!

Copilot AI Dec 14, 2025

Uh oh!

sroet Dec 15, 2025

Uh oh!

Copilot AI Dec 14, 2025

Uh oh!

Copilot AI Dec 14, 2025

Uh oh!

codecov bot commented Dec 14, 2025 •

edited

Loading

Uh oh!

sroet left a comment

Uh oh!

sroet Dec 15, 2025

Uh oh!

sroet Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		engines = collections.Counter([s.engine for s in
		sample.trajectory])

- engines = collections.Counter([s.engine for s in
- sample.trajectory])
+ if not sample.trajectory:
+ raise ValueError(
+ "Cannot determine writer: sample.trajectory is empty. "
+ "Please provide a writer explicitly or ensure the trajectory is not empty."
+                )
+ engines = collections.Counter([s.engine for s in sample.trajectory])

	def __init__(self, ext, mdtraj_selection=None):
	def __init__(self, *, ext, mdtraj_selection=None):

		if sample.ensemble.is_named:
		ensemble_id = sample.ensemble.name

Conversation

dwhswenson commented Dec 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

sroet Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sroet left a comment

Choose a reason for hiding this comment

Uh oh!

sroet Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

sroet Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 14, 2025 •

edited

Loading