Skip to content

Drift detection#8

Open
gsavage wants to merge 3 commits into
mainfrom
driftfile
Open

Drift detection#8
gsavage wants to merge 3 commits into
mainfrom
driftfile

Conversation

@gsavage
Copy link
Copy Markdown
Contributor

@gsavage gsavage commented May 11, 2026

Implementation of drift detection. When applying changes, we write a new json file to S3 alongside the Terraform statefile.

This file stores the git sha that resulted in the terraform statefile.

A new workflow reads this json file and runs terraform plan using the code as it looked at the time the json file was written. If the terraform plan is empty then we know there has been no drift, but if the plan is not empty then drift has happened. The JSON file is then updated to reflect the time at which drift was detected.

Separate monitoring, for example Kosli environment snapshots, can monitor the JSON files to detect when drift has happened. Running a new apply will reset the JSON file.

This is the first part of the work to handle drift detection and environment compliance.

gsavage and others added 3 commits May 11, 2026 14:59
Adds a follow-up job to the apply workflow that writes a small JSON
record (the triggering SHA plus a hardcoded drift flag) for use by a
later drift-detection step. The file is always published as a GitHub
artifact, and is also uploaded to s3://<bucket>/<repo>/drift.plan.json
when the new optional s3_bucket input is supplied — leaving it unset
skips both the AWS credential exchange and the S3 upload.

The work lives in apply.yml rather than base.yml so that the plan
workflow, which also consumes base.yml, does not produce the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces a reusable Detect Drift workflow that consumers can schedule
(via cron, workflow_dispatch, etc.) to compare deployed infrastructure
against the last applied state. It reads the drift baseline written by
the apply workflow from S3, runs a plan against the recorded SHA via
base.yml, and — if the plan contains changes — overwrites drift.plan.json
in S3 with the same SHA and an ISO 8601 timestamp in the drift field.
A third-party monitor watching that object then sees drift != false and
fires an alert.

To support this, base.yml gains two backward-compatible additions:
  * a `ref` input, threaded into the first actions/checkout so the
    drift workflow can plan against the historical SHA rather than the
    triggering ref;
  * a `has_changes` workflow output, derived from grepping the existing
    plan text for "No changes.", so the caller can decide whether to
    flag drift.

The workflow fails loudly when no baseline is present in S3, on the
assumption that a missing baseline reflects a real configuration
problem (apply.yml has never run, or the object was deleted) that
should surface rather than be silently skipped. A top-level concurrency
group keyed on repository + environment prevents overlapping scheduled
runs from racing on the same JSON object.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
apply.yml, plan.yml, and detect-drift.yml previously pinned base.yml to
the literal @main ref. That made end-to-end testing of cross-file
changes painful: a branch that modified both, say, detect-drift.yml and
base.yml could not be exercised from a consumer repo without either
merging to main or temporarily rewriting the @main pin to a feature
branch (and remembering to revert it).

Switching the three callers to the same-repo relative form
"./.github/workflows/base.yml" makes them follow the calling reusable
workflow's own ref. A consumer that pins
"kosli-dev/tf/.github/workflows/detect-drift.yml@<ref>" now transitively
pulls base.yml at the same <ref>, so the one entry-point pin in the
consumer is the only ref knob in the whole chain.

Also folds in the related permissions bump on detect-drift.yml's plan
job (contents: read -> contents: write) so the job can grant base.yml
the contents: write it currently requests — a temporary workaround to
keep the wider-permissions test path working while we evaluate
tightening base.yml itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant