Skip to content

extending shutdown to handle single agent setup in relayed mode#19

Merged
armin-acn merged 3 commits intoeclipse-score:mainfrom
Valeo-S-CORE-Organization:extend-shutdown-single-agent
Nov 24, 2025
Merged

extending shutdown to handle single agent setup in relayed mode#19
armin-acn merged 3 commits intoeclipse-score:mainfrom
Valeo-S-CORE-Organization:extend-shutdown-single-agent

Conversation

@ShoroukRamzy
Copy link
Contributor

Hi @artemsheinacn,

This PR fixes a deadlock during shutdown when using a relayed signaling mode in a single-agent setup as per our discussion here
#17

The PrimaryReceiveRelay thread would block indefinitely waiting for remote agents that don't exist. The fix is to prevent the relay threads from being created at all in this scenario, as all workers are local and communicate via MPSC.

The SchedulerConnector for the relayed mode now conditionally creates the PrimaryReceiveRelay and PrimarySendRelay only if there are remote agents configured. This makes the shutdown clean and avoids spawning an unnecessary thread.

Thanks!

@github-actions
Copy link

github-actions bot commented Nov 19, 2025

License Check Results

🚀 The license check job ran with the Bazel command:

bazel run //:license-check

Status: ⚠️ Needs Review

Click to expand output
[License Check Output]
Starting local Bazel server (8.3.0) and connecting to it...
INFO: Invocation ID: b5a3876c-9461-426a-89f5-303ebd772205
Computing main repo mapping: 
Computing main repo mapping: 
DEBUG: Rule 'rules_boost+' indicated that a canonical reproducible form can be obtained by modifying arguments integrity = "sha256-iKHWUIBrUN/fFOCltkTmHfDcKw0h4ZVmP7NVSoc8zBs="
DEBUG: Repository rules_boost+ instantiated at:
  <builtin>: in <toplevel>
Repository rule http_archive defined at:
  /home/runner/.bazel/external/bazel_tools/tools/build_defs/repo/http.bzl:394:31: in <toplevel>
Computing main repo mapping: 
Computing main repo mapping: 
Computing main repo mapping: 
Computing main repo mapping: 
Computing main repo mapping: 
Loading: 
Loading: 4 packages loaded
Loading: 4 packages loaded
    currently loading: 
Analyzing: target //:license-check (5 packages loaded, 0 targets configured)
Analyzing: target //:license-check (5 packages loaded, 0 targets configured)

Analyzing: target //:license-check (73 packages loaded, 9 targets configured)

Analyzing: target //:license-check (150 packages loaded, 2294 targets configured)

Analyzing: target //:license-check (156 packages loaded, 6864 targets configured)

Analyzing: target //:license-check (156 packages loaded, 6864 targets configured)

INFO: Analyzed target //:license-check (159 packages loaded, 8880 targets configured).
[9 / 13] Creating runfiles tree bazel-out/k8-opt-exec-ST-d57f47055a04/bin/external/score_tooling+/dash/tool/formatters/dash_format_converter.runfiles [for tool]; 0s local ... (2 actions, 1 running)
INFO: From Generating Dash formatted dependency file ...:
INFO: Successfully converted 2 packages from Cargo.lock to bazel-out/k8-fastbuild/bin/formatted.txt
INFO: Found 1 target...
Target //:license.check.license_check up-to-date:
  bazel-bin/license.check.license_check
  bazel-bin/license.check.license_check.jar
INFO: Elapsed time: 15.920s, Critical Path: 0.46s
INFO: 13 processes: 4 disk cache hit, 9 internal.
INFO: Build completed successfully, 13 total actions
INFO: Running command line: bazel-bin/license.check.license_check ./formatted.txt <args omitted>
usage: org.eclipse.dash.licenses.cli.Main [-batch <int>] [-cd <url>]
       [-confidence <int>] [-ef <url>] [-excludeSources <sources>] [-help] [-lic
       <url>] [-project <shortname>] [-repo <url>] [-review] [-summary <file>]
       [-timeout <seconds>] [-token <token>]

@github-actions
Copy link

The created documentation from the pull request is available at: docu-html

Signed-off-by: ShoroukRamzy <82869891+ShoroukRamzy@users.noreply.github.com>
// (100.into(), vec![w40, w41, w42, w43, w44]),
// ]
// .into_iter()
// .collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be removed

recorder_ids: vec![],
worker_assignments: agent_assignments().remove(&AGENT_ID).unwrap(),
timeout: Duration::from_secs(10),
timeout: Duration::from_secs(60),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a longer timeout?

Copy link
Contributor Author

@ShoroukRamzy ShoroukRamzy Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while testing the single agent in relayed mode I encountered sporadic issue of missing all worker connections within the timeout, that's why I needed to increase the timeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a separate issue, we better make a separate ticket for it. I don't think it's a good idea to mask the issue with an unreasonably long timeout, especially as part of an unrelated PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @artemsheinacn, I fully agree with you. I will revert it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@artemsheinacn
Copy link
Contributor

@armin-acn please review when you have time

@ShoroukRamzy ShoroukRamzy force-pushed the extend-shutdown-single-agent branch from 4c11f36 to 8220056 Compare November 19, 2025 18:01
Copy link
Contributor

@artemsheinacn artemsheinacn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @ShoroukRamzy!

@ShoroukRamzy
Copy link
Contributor Author

Thanks for the PR @ShoroukRamzy!

Thank you for your feedbacks @artemsheinacn

@ShoroukRamzy
Copy link
Contributor Author

Thanks for the PR @ShoroukRamzy!

Thank you for your feedbacks @artemsheinacn

Hi @armin-acn, I don't have permissions to merge this PR. Could you please help me with this when you have time?

Thanks!

@armin-acn armin-acn merged commit 3f8f700 into eclipse-score:main Nov 24, 2025
8 checks passed
dcleoliu pushed a commit to dcleoliu/feo that referenced this pull request Mar 1, 2026
…pse-score#19)

* extending shutdown to handle single agent setup in relayed mode

* extending shutdown to handle single agent setup in relayed mode

---------

Signed-off-by: ShoroukRamzy <82869891+ShoroukRamzy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants