Skip to content

Fix race condition in Command.IsRunning() check#310

Open
circleci-app[bot] wants to merge 3 commits into
mainfrom
chunk/changes-1779284456686
Open

Fix race condition in Command.IsRunning() check#310
circleci-app[bot] wants to merge 3 commits into
mainfrom
chunk/changes-1779284456686

Conversation

@circleci-app
Copy link
Copy Markdown
Contributor

@circleci-app circleci-app Bot commented May 20, 2026

Summary

Fixes a race condition in the task agent command execution that was causing test failures in TestOrchestrator/error: task agent encountered fatal error.

Problem

The test was failing with an unexpected error message:

error on shutdown: task agent process is still running, which could interrupt the task...

This was caused by a race condition where IsRunning() was called in the shutdown phase before the isCompleted flag was set, even though Wait() had already returned.

Root Cause

The isCompleted flag was being set in the defer of the wait() function:

defer func() {
    _ = cmd.Cancel()
    c.isCompleted.Store(cmd.ProcessState != nil)
}()

err := cmd.Wait()

The defer runs after wait() returns, but there's no guarantee about timing relative to the calling goroutine. This created a race where:

  1. Process exits and Wait() returns an error
  2. executeAgent() returns the error to the main goroutine
  3. shutdown() checks IsRunning()
  4. The defer hasn't completed yet, so isCompleted is still false
  5. IsRunning() incorrectly returns true
  6. An "still running" error is generated

Solution

Set isCompleted = true immediately after Wait() returns, before any error handling:

err := cmd.Wait()
c.isCompleted.Store(true)  // Set immediately, before error handling

This ensures IsRunning() returns false as soon as Wait() completes, eliminating the race condition.

Testing

  • All existing tests pass on Linux
  • The specific failing test now passes: TestOrchestrator/error: task agent encountered fatal error
  • No behavior changes for successful command execution paths

Original failing job

View this task in the CircleCI web app →

renovate Bot and others added 2 commits May 19, 2026 19:21
The test "TestOrchestrator/error: task agent encountered fatal error" was
failing because of a race condition where IsRunning() was being called
before the isCompleted flag was set, even though Wait() had returned.

The issue was that isCompleted was set in the defer of wait(), which runs
after the function returns but with no guarantee of when relative to the
calling goroutine receiving the return value. This caused a race where:

1. executeAgent() calls taskAgent.Wait() which blocks
2. The process exits and wait() returns an error
3. The defer sets isCompleted based on ProcessState
4. executeAgent returns the error to the main goroutine
5. shutdown() checks IsRunning() before the defer completes
6. IsRunning() returns true because isCompleted is still false
7. This generates an incorrect "task agent process is still running" error

The fix is to set isCompleted=true immediately after Wait() returns,
before any error handling or defer cleanup. This ensures IsRunning()
returns false as soon as Wait() completes, regardless of whether there
was an error or what the ProcessState is.

This eliminates the race condition and ensures correct behavior in the
shutdown phase.
@circleci-app circleci-app Bot requested a review from a team as a code owner May 20, 2026 13:51
@renovate renovate Bot force-pushed the renovate/go-github.com-modelcontextprotocol-registry-vulnerability branch from f6b4612 to 69256f8 Compare May 21, 2026 11:08
Base automatically changed from renovate/go-github.com-modelcontextprotocol-registry-vulnerability to main May 22, 2026 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant