Rollbacks: Adding ArgoCD deployer backend#499
Conversation
06e7315 to
40cf853
Compare
| GetDeploymentStatus(ctx context.Context, deploymentID string) (*DeploymentResult, error) | ||
|
|
||
| // PostDeploymentHooks is called after a deployment is successful. | ||
| // E.g. adding a sync window on a ArgoCD application |
There was a problem hiding this comment.
| // E.g. adding a sync window on a ArgoCD application | |
| // E.g. adding a deny sync window on a ArgoCD application |
|
|
||
| | Key | Description | | ||
| | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| | `target_revision` | The infrastructure revision to deploy. Can be a literal revision string or a JSONPath expression into the Release (e.g. `{.config.revisions[?(@.name=="infrastructure")].id}`). | |
There was a problem hiding this comment.
Do we want to accept literal revision string here (or for app_revision )? I'm thinking if someone creates a Rollback to release_A and then specifies literal string revisions that aren't actually associated with that release, then the target state would have nothing to do with the target release, which would be misleading.
There was a problem hiding this comment.
Good point! I will update the documentation to not state that you can provide a literal revision.
I'm thinking if someone creates a Rollback to release_A and then specifies literal string revisions that aren't actually associated with that release, then the target state would have nothing to do with the target release, which would be misleading.
With the interfaces we provide within GC, I don't think this will be an issue, but still a valid point. You can also make the case that people will be able to use a JSONPath that doesn't target the original target_revision, which will result in an inaccurate rollback.
| func (d *Deployer) mapSyncStatus(app applicationResponse) (cicd.DeploymentStatus, string) { | ||
| // Check operation state first — it reflects the active sync operation | ||
| if op := app.Status.OperationState; op != nil && op.Phase != "" { | ||
| switch op.Phase { | ||
| case OperationPhaseRunning: | ||
| return cicd.DeploymentStatusInProgress, op.Message | ||
| case OperationPhaseError, OperationPhaseFailed: | ||
| return cicd.DeploymentStatusFailed, op.Message | ||
| case OperationPhaseSucceeded: | ||
| return cicd.DeploymentStatusSucceeded, op.Message | ||
| default: | ||
| return cicd.DeploymentStatusPending, fmt.Sprintf("Operation phase: %s", op.Phase) | ||
| } | ||
| } | ||
|
|
||
| return cicd.DeploymentStatusPending, "No active operation" | ||
| } |
There was a problem hiding this comment.
I'm still unsure about mapping the operation status of whatever operation happens to be running to the deployment status. I'm not sure if there is an easy way to verify the operation is the same rollback sync that the controller initiated, but not doing so could produce some weird behaviour. E.g. if the controller fails to trigger a sync, but some other actor (CICD, human etc.) triggers one, the rollback could get marked as in progress/completed.
There was a problem hiding this comment.
Also, I'm not sure if we need to verify whether the operation is even a sync operation (though I'm unsure if other types of operations exist as I write this).
There was a problem hiding this comment.
That's a valid concern!
After a bit of back and forth with Claude today, I've updated the logic so the application name and the set of revisions are recovered as the deployment URL. Then, when the GetDeploymentStatus is called, the logic makes sure that the latest sync operation has the target revisions we are rolling back to. Even if an external action (person or CD) syncs to the same revision, the rollback will be marked as successful.
Only if the external actor syncs to a different set of revisions before the controller reconciles the rollback to a terminal state, or the sync operation fails, will the rollback be marked as failed.
This comment was marked as outdated.
This comment was marked as outdated.
fix: add webhooks back fix
…onstants Move ArgoCD API struct definitions (response, patch, request types) from deployer.go into types.go and introduce named constants for sync status, health status, and operation phase values to avoid raw string comparisons. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
58834d1 to
9651dfb
Compare
@0x0013 I've minimised your comment as I changed the logic of the argocd deployer, and I will run the Claude code review separately on the new logic. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
@claude review this PR! |
|
Claude finished @goelozev's task in 3m 6s —— View job Code Review: Rollbacks — Adding ArgoCD Deployer Backend
🔴 Critical Issues[
type applicationPatchSource struct {
TargetRevision string `json:"targetRevision"`
Plugin applicationPatchPlugin `json:"plugin,omitempty"` // ← BUG
}Go's Fix: change 🟡 Major Concerns[ return argocddeployer.NewDeployer(http.DefaultClient, ...)
[ Both [
[ proj.Spec.SyncWindows = append(proj.Spec.SyncWindows, SyncWindow{
Kind: "deny",
Schedule: "* * * * *",
Duration: "1h",
Applications: []string{"*"}, // ← all apps in the project
...
})This blocks syncs for every application in the ArgoCD project, not just the application that was rolled back. If a project houses multiple services, triggering a rollback on one will prevent all others from being synced for up to an hour. The [ There is no guard against adding a duplicate deny window. If [ err := r.triggerPostDeploymentHooks(ctx, rollback, toRelease)
if err != nil {
logger.Error(err, "failed to trigger post deployment hooks")
message = fmt.Sprintf("%s (post-deployment hooks failed: %v)", message, err)
}
return r.markRollbackSucceeded(ctx, logger, rollback, message)A hooks failure only appends to the [ if err := json.NewDecoder(getResp.Body).Decode(&proj); err != nil {
return fmt.Errorf("failed to decode project response: %w", err) // plain error
}
...
if err := json.Marshal(projectUpdate); err != nil {
return fmt.Errorf("failed to marshal project update: %w", err) // plain error
}These error paths return plain [ When 🔵 Minor Improvements[ The README states [ The [ If an ArgoCD application name or a target revision ever contains [ The test suite covers most branches well. One gap: the case where the last operation succeeded, the ✅ Positive Observations
Summary & Next StepsThe PR introduces a solid foundation for ArgoCD-backed rollbacks with good test coverage and a well-reasoned status-detection strategy. The most important fix before merge is the |
fix: add default http client timeout fix: switch to the existing logger fix: removed unused mapSyncStatus refactor: move error handler of getApplication docs: updated ArgoCD deployer docs test: adding a new test case fix: switch back to the default client
f8d8fd5 to
7355a1e
Compare
Addressed ✅
Addressed ✅
That's expected behaviour!
This is expected behaviour!
Addressed ✅
Addressed ✅
Addressed ✅
Addressed ✅
Addressed ✅
Not addressed.
Addressed ✅ |
Summary
Adding ArgoCD CI/CD backend for the rollback controller. This implementation will allow for faster rollbacks as it will skip our current CI system and directly perform the rollback. To run the rollback controller with ArgoCD, you will need to provide the following cmd flags:
--cicd-backend=argocd--argocd-server-url=<your-argocd-url>the base URL of your ArgoCD server (e.g.https://argocd.yourcompany.dev)--argocd-auth-token=<your-argocd-token>authentication token for a robot account, defined inargocd-cm (e.g.accounts.theatre: apiKey`)--argocd-app-name-template="<template>"- define template to resolve the application name. In GC, we are using thecompute-lab-{{.Namespace}}-{{.Target}}template for all ArgoCD application names. If you wish, you can modify that template, but currently, the only available fields you can use are.Namespaceand.Target. You can also override this on a Rollback level by providing theargocd_app_namedeployment option.Requirements
Rollbacks using this CI/CD backend requires the following deployment options:
target_revision- a specific revision or a JSONPath in the Release to find the revision (e.g.'{.config.revisions[?(@.name=="infrastructure")].id}').Optional deployment options:
app_revision- a specific revision or a JSONPath in the Release to find the revision (e.g.'{.config.revisions[?(@.name=="application")].id}').argocd_app_name- you can provide this deployment option if you wish to specify a specific application name if the application is not using the template provided in--argocd-app-name-template.argocd_add_sync_window- when set to true, a sync window will be added to the application, which will pause any further deploymentsChanges to the Deployer interface
The Deployer interface (used to provide structure for CI/CD backends) now has a
PostDeploymentHooksmethod, which will be called after the Rollback is deemed Healthy.Tests
Tests conducted in our lab environment:
target_revision,app_revisionandargocd_add_sync_window=trueset ✅target_revisionandargocd_add_sync_window=trueset ✅target_revisiononly set ✅