Skip to content

Trajectory Export: Multiple Steps#1176

Open
dwhswenson wants to merge 6 commits intoopenpathsampling:masterfrom
dwhswenson:dump-trajectories
Open

Trajectory Export: Multiple Steps#1176
dwhswenson wants to merge 6 commits intoopenpathsampling:masterfrom
dwhswenson:dump-trajectories

Conversation

@dwhswenson
Copy link
Member

This exports trajectories from a set of MC steps, using symlinks to prevent duplication. There will be snapshot duplication when two trajectories share snapshots (one-way-shooting) and coordinate duplication when two trajectories are reversed (path reversal), but full trajectories, which may not change across MC steps (e.g., shooting in only one ensemble leaves the others unchanged), are not duplicated.

The default file structure is, e.g.:

100/
├── active
│   ├── condensed MIS minus.pdb -> ../../raw_data/17290287612105036191526417646194015568.pdb
│   ├── condensed->extended 0.pdb -> ../../raw_data/17290287612105036191526417646194017830.pdb
│   ├── condensed->extended 1.pdb -> ../../raw_data/17290287612105036191526417646194017878.pdb
│   └── condensed->extended 2.pdb -> ../../raw_data/17290287612105036191526417646194017276.pdb
└── trials
    └── condensed->extended 1.pdb -> ../../raw_data/17290287612105036191526417646194018084.pdb
raw_data/
├── 17290287612105036191526417646194015568.pdb
├── 17290287612105036191526417646194017830.pdb
├── 17290287612105036191526417646194017878.pdb
├── 17290287612105036191526417646194017276.pdb
├── 17290287612105036191526417646194018084.pdb

where, in that example, 100 is the step number. The files in active and trials are named by the friendly name of the ensemble, and are symlinks to data in raw_data, named by the UUID.

I've also changed it so that tracking the extension is the job of the writer. This does mean that the writer is more complicated than just a bare function, but it feels to me like it makes more sense.

Dump steps as files, organized by raw (real files) and trial and active
(symlinks). Works in principle as of this commit, but need to try it out
in reality.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements trajectory export functionality for multiple Monte Carlo steps using symlinks to avoid data duplication. The key changes include adding file extension tracking to trajectory writers and creating a new SymLinkStepExporter class that exports raw trajectory data once and creates symlinks for trial and active samples.

  • Added ext property to trajectory writer classes to track file extensions
  • Created SymLinkStepExporter class for exporting MC steps with symlink-based deduplication
  • Updated tests to pass the required ext parameter to MDTrajTrajectoryWriter

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
openpathsampling/exports/trajectories/core.py Added abstract ext property to TrajectoryWriter base class and concrete implementation in SimStoreTrajectoryWriter
openpathsampling/exports/trajectories/mdtrajtrajectorywriter.py Made ext a required parameter and added ext property to return it
openpathsampling/exports/trajectories/trrtrajectorywriter.py Added ext property returning "trr"
openpathsampling/exports/steps/symlink_step_exporter.py New class implementing step export with symlink deduplication and helper function export_steps
openpathsampling/tests/exports/steps/test_symlink_step_exporter.py Comprehensive test suite for SymLinkStepExporter covering various MC move types
openpathsampling/tests/exports/steps/conftest.py Import fixture from analysis tests for use in step export tests
openpathsampling/tests/exports/trajectories/test_mdtrajtrajectorywriter.py Updated test calls to pass required ext="xtc" parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

----------
step : Step
The step containing the sample.
sample : Sample
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring parameter has inconsistent indentation. Line 115 "sample : Sample" has 7 leading spaces instead of 8 spaces to match line 113 "step : Step". The indentation should be consistent for all parameters in the docstring.

Suggested change
sample : Sample
sample : Sample

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +75
engines = collections.Counter([s.engine for s in
sample.trajectory])
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When self.writer is None and sample.trajectory is empty, this code will raise an IndexError on line 76 when trying to access engines.most_common(1)[0][0]. Consider adding validation to handle empty trajectories or documenting that trajectories must not be empty.

Suggested change
engines = collections.Counter([s.engine for s in
sample.trajectory])
if not sample.trajectory:
raise ValueError(
"Cannot determine writer: sample.trajectory is empty. "
"Please provide a writer explicitly or ensure the trajectory is not empty."
)
engines = collections.Counter([s.engine for s in sample.trajectory])

Copilot uses AI. Check for mistakes.
for saving data that could be used in a restart.
"""
def __init__(self, mdtraj_selection=None):
def __init__(self, ext, mdtraj_selection=None):
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ext parameter in MDTrajTrajectoryWriter is now a required positional argument. Consider making it a keyword-only argument or providing a default value to maintain backward compatibility and improve API clarity. For example: def init(self, *, ext, mdtraj_selection=None) would make it clear that ext must be explicitly specified.

Suggested change
def __init__(self, ext, mdtraj_selection=None):
def __init__(self, *, ext, mdtraj_selection=None):

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same vein; would this change break loading files with old writers? and should OPS deal with that?

Comment on lines +43 to +50
base_dir='.',
# TODO: add in a base using storage handlers, default to cwd
raw_data_pattern=_DEFAULT_RAW_DATA_PATTERN,
trial_pattern=_DEFAULT_TRIAL_PATTERN,
active_pattern=_DEFAULT_ACTIVE_PATTERN,
):
self.writer = writer
self.base_dir = pathlib.Path(base_dir)
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base_dir parameter is stored in init but never used anywhere in the class. Either this parameter should be used to construct absolute paths (e.g., combining it with the patterns), or it should be removed if it's not needed. This could be confusing for users who expect base_dir to affect where files are written.

Copilot uses AI. Check for mistakes.

from openpathsampling.exports.steps.symlink_step_exporter import (
SymLinkStepExporter,
_DEFAULT_TRIAL_PATTERN, _DEFAULT_ACTIVE_PATTERN,
Copy link

Copilot AI Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of '_DEFAULT_TRIAL_PATTERN' is not used.
Import of '_DEFAULT_ACTIVE_PATTERN' is not used.

Suggested change
_DEFAULT_TRIAL_PATTERN, _DEFAULT_ACTIVE_PATTERN,

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Dec 14, 2025

Codecov Report

❌ Patch coverage is 96.25000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.37%. Comparing base (72c3dd7) to head (4ec34e3).

Files with missing lines Patch % Lines
openpathsampling/exports/trajectories/core.py 80.00% 1 Missing ⚠️
...ing/exports/trajectories/mdtrajtrajectorywriter.py 80.00% 1 Missing ⚠️
...mpling/exports/trajectories/trrtrajectorywriter.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1176      +/-   ##
==========================================
+ Coverage   81.28%   81.37%   +0.08%     
==========================================
  Files         147      148       +1     
  Lines       15807    15886      +79     
==========================================
+ Hits        12849    12927      +78     
- Misses       2958     2959       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@sroet sroet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not into the code enough to do a proper review, but I did note some things that might behave wonky from what I could remember from OPS 😅

Other than the review comments; from the test coverage it seems like the ext property isn't actually used at the moment by the writers)

Feel free to of course ignore any/all of these comments, especially as I now see you didn't actually asked for review on this PR yet, but it felt like a waste to just remove the review comments...

Comment on lines +58 to +59
if sample.ensemble.is_named:
ensemble_id = sample.ensemble.name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we guarantee that ensemble names are unique and would that break the file storage? (sorry, it's been a while since I've looked at/used OPS)

for saving data that could be used in a restart.
"""
def __init__(self, mdtraj_selection=None):
def __init__(self, ext, mdtraj_selection=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same vein; would this change break loading files with old writers? and should OPS deal with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants