Skip to content

Use a deque to keep track of more recent time estimates#989

Closed
sroet wants to merge 3 commits intoopenpathsampling:masterfrom
sroet:better_time_estimates_master
Closed

Use a deque to keep track of more recent time estimates#989
sroet wants to merge 3 commits intoopenpathsampling:masterfrom
sroet:better_time_estimates_master

Conversation

@sroet
Copy link
Member

@sroet sroet commented Mar 2, 2021

Today, I got annoyed by the fact that OPS's Estimated time remaining becomes worse, the longer you run (see for example, the table in this comment )

This fixes that (slightly) by using a deque with a default length of 10_000 MC steps. So that it just uses the average of the last 10_000 steps to estimate the average step time and the time remaining.

The changes in this PR are:

  • use deque to get a slightly better estimate of the time-left (47f76d4 and babd276)
  • add a time_per_step override for progress_string to override the time_per_step calculation and adds a test for this behavior (47f76d4)
  • removes nose from test_tools.py, partial progress towards Switch tests to pytest #756 (6a8485c)

@dwhswenson
Copy link
Member

dwhswenson commented Mar 2, 2021

The idea is fine, but can you redo this PR to build off of #911? I'm pretty sure there will be significant conflicts with that. (I really need to finish reviewing hooks-related stuff; I'm insisting on that stuff being in the 1.5 release, and that's the main thing remaining for 1.5. If you can give an eye to #755 and #911 with a mental focus on "will this work if the hooks use Dask?" that would be a big help -- the mental energy of needing to ensure that, and being the only person to check it, has been a big part of why I've been slow on that.) Plus, this kind of approach is exactly what hooks should improve: let you easily customize the behavior of something like this, with code that is more isolated from the rest of the logic.

Also, probably the single biggest (and potentially easiest) approach to solve the underlying problem would be to add a SQL index on the UUID for more tables. I believe we now do that for the UUIDs table; it would probably make sense to do that for all data object tables (or even all tables; the extra cost for simulation objects isn't much). I suspect that most of the slow-down you see now is because snapshot tables don't have a UUID index, meaning that the cost of adding to them grows linearly with the number of snapshots stored. PR (separate from this) most definitely welcome on that.

@sroet
Copy link
Member Author

sroet commented Mar 4, 2021

closing this for now, will rebuild off of #911 in the future

@sroet sroet closed this Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants