Skip to content

feat(cli): add keyColumn option for CSV buckets with duplicate key validation#2076

Open
cherkanovart wants to merge 4 commits intomainfrom
feat/csv-key-column
Open

feat(cli): add keyColumn option for CSV buckets with duplicate key validation#2076
cherkanovart wants to merge 4 commits intomainfrom
feat/csv-key-column

Conversation

@cherkanovart
Copy link
Copy Markdown
Contributor

@cherkanovart cherkanovart commented Apr 23, 2026

Summary

Adds a keyColumn config option for CSV buckets so users can specify which column to use as the unique row identifier, and introduces duplicate key validation to prevent silent data loss when the key column has repeated values.

Changes

  • Added keyColumn optional field to bucket config schema (packages/spec)
  • CSV loader now accepts keyColumn option; falls back to first column detection when omitted
  • Added assertUniqueKeys() validation that throws a descriptive error when the key column has duplicate values (previously rows would silently overwrite each other)
  • Added runtime warning when keyColumn is set on a non-csv bucket type (ignored with message)
  • Threaded keyColumn through all 12 command call sites (cleanup, i18n, lockfile, purge, status, show, run/plan, run/execute, run/frozen)

Testing

Business logic tests added:

  • Throws descriptive error when key column has duplicate values (mirrors Salesfloor bug)
  • No false positive when detected key column is unique
  • keyColumn option overrides first-column detection and uses specified column
  • Throws when keyColumn points to a column not present in the CSV
  • Throws on duplicates even when keyColumn explicitly points at a non-unique column
  • getBuckets() correctly passes keyColumn for csv buckets
  • getBuckets() warns and ignores keyColumn on non-csv buckets
  • All tests pass locally

Checklist

  • Changeset added (if version bump needed)
  • Tests cover business logic (not just happy path)
  • No breaking changes (or documented below)

Note: Users with CSV files where the first column has duplicate values will now get an error instead of silent data loss. The error message explains how to fix it (add a unique column or set keyColumn).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Add keyColumn for CSV buckets so you can choose the column that uniquely identifies rows; loaders now map translations by that key to preserve associations.
    • Add upfront uniqueness validation to prevent silent overwrites when duplicate keys exist.
  • Bug Fixes

    • CLI warns and ignores keyColumn for non‑CSV bucket types to avoid misconfiguration.
  • Tests

    • Add tests for keyColumn behavior, validation, error cases, and round‑trip integrity.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 95d5636f-a3dc-4d63-ac74-15190b5ce13a

📥 Commits

Reviewing files that changed from the base of the PR and between c046a87 and 53599f3.

📒 Files selected for processing (1)
  • packages/cli/demo/csv/i18n.json
✅ Files skipped from review due to trivial changes (1)
  • packages/cli/demo/csv/i18n.json

📝 Walkthrough

Walkthrough

Adds an optional keyColumn bucket option for CSV buckets, validates header presence and uniqueness in the CSV loader (errors on duplicates/missing key), updates schema and demo config, threads keyColumn through bucket parsing and all CLI commands that construct loaders, and adds tests for behavior.

Changes

Cohort / File(s) Summary
Changeset & Schema
/.changeset/csv-key-column.md, packages/spec/src/config.ts
Adds a changeset and introduces keyColumn?: string to v1.3 bucket schema.
Bucket config parsing & warnings
packages/cli/src/cli/utils/buckets.ts, packages/cli/src/cli/utils/buckets.spec.ts
getBuckets accepts keyColumn for csv buckets, warns (deduplicated) when present on non-CSV types; tests cover retention and warning.
CSV loader implementation & tests
packages/cli/src/cli/loaders/csv.ts, packages/cli/src/cli/loaders/csv.spec.ts
createCsvLoader(options?: CsvLoaderOptions) added with keyColumn?: string; validates header existence and per-instance uniqueness; errors on duplicate keys; tests added for detection, config, and push/pull round-trip by key.
Loader composition & types
packages/cli/src/cli/loaders/index.ts, packages/cli/src/cli/cmd/run/_types.ts
BucketLoaderOptions gains keyColumn?: string; CSV loader constructed with { keyColumn: options.keyColumn }; CmdRunTask extended with optional keyColumn.
Task planning & execution
packages/cli/src/cli/cmd/run/plan.ts, packages/cli/src/cli/cmd/run/execute.ts
Tasks populated with keyColumn; several task title strings condensed to single-line templates; execution forwards keyColumn into createBucketLoader.
CLI command integrations
packages/cli/src/cli/cmd/i18n.ts, .../lockfile.ts, .../cleanup.ts, .../purge.ts, .../run/frozen.ts, .../show/_shared-key-command.ts, .../status.ts
All command entry points that build bucket loaders now forward bucket.keyColumn in the loader options for consistent CSV key handling.
Demo config
packages/cli/demo/csv/i18n.json
Adds buckets.csv.keyColumn: "KEY" to demo configuration.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as CLI Command
  participant Buckets as getBuckets
  participant LoaderFactory as createBucketLoader
  participant CSVLoader as createCsvLoader
  participant FS as CSV File

  CLI->>Buckets: read bucket configs (includes keyColumn)
  Buckets-->>CLI: bucket objects (keyColumn present for CSV)
  CLI->>LoaderFactory: request loader with { keyColumn }
  LoaderFactory->>CSVLoader: instantiate with { keyColumn }
  CSVLoader->>FS: read CSV (header + rows)
  CSVLoader->>CSVLoader: validate header contains keyColumn
  CSVLoader->>CSVLoader: ensure unique non-empty key values
  alt duplicates found
    CSVLoader-->>CLI: throw validation error (duplicate keys)
  else unique
    CSVLoader-->>CLI: return translations mapped by keyColumn
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • vrcprl

Poem

🐰 I nibble headers, sniff the column fine,
I tally keys like carrots on a vine,
No duplicate hides where my whiskers go,
I hop from pull to push with steady flow,
Each CSV now has a name I know! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(cli): add keyColumn option for CSV buckets with duplicate key validation' clearly and concisely describes the main change—adding a keyColumn configuration option with validation—matching the core changeset.
Description check ✅ Passed The PR description comprehensively covers all required sections: summary explains the feature, changes lists the implementation across spec/loader/commands, testing documents the business-logic tests with checkmarks, and the checklist confirms tests pass and no breaking changes (with context provided).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/csv-key-column

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/cli/src/cli/loaders/csv.ts`:
- Around line 47-49: The code currently calls detectKeyColumnName on
input.split("\n").find((l) => l.length)! which can be undefined for empty or
whitespace-only CSVs; change this to first locate a non-blank header line (e.g.,
find((l) => l.trim().length > 0)), validate that result before calling
detectKeyColumnName, and if no header line is found handle it explicitly (either
throw a clear error like "empty CSV input" or use a sensible default) while
keeping the options?.keyColumn fallback; update the assignment of keyColumnName
to use this validated header line when calling detectKeyColumnName.
- Around line 57-64: The validation currently builds availableColumns from
Object.keys(inputParsed[0]) which can miss header columns when the first data
row is sparse; instead parse the CSV header row separately (e.g., call the CSV
parser with to_line: 1 or otherwise extract parser.metadata.fields) and use that
header array to populate availableColumns for the options.keyColumn check;
update the logic around options?.keyColumn and the availableColumns variable in
the loader function (the code that throws the Error) to use the header-derived
columns rather than Object.keys(inputParsed[0]).

In `@packages/cli/src/cli/utils/buckets.ts`:
- Around line 67-79: Replace the truthy check "if (bucketEntry.keyColumn)" with
an explicit presence check (e.g., "if ('keyColumn' in bucketEntry)" or "if
(typeof bucketEntry.keyColumn !== 'undefined')") so an empty string is handled
deliberately; then keep the existing branches using bucketType,
warnedKeyColumnTypes, and config.keyColumn so that empty-string values are
propagated for CSV buckets (config.keyColumn = bucketEntry.keyColumn) and still
trigger the warning for non-CSV buckets instead of being silently ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8fa0d635-dd64-419d-a95f-62f6a7a83113

📥 Commits

Reviewing files that changed from the base of the PR and between 873e773 and eab1c89.

📒 Files selected for processing (17)
  • .changeset/csv-key-column.md
  • packages/cli/src/cli/cmd/cleanup.ts
  • packages/cli/src/cli/cmd/i18n.ts
  • packages/cli/src/cli/cmd/lockfile.ts
  • packages/cli/src/cli/cmd/purge.ts
  • packages/cli/src/cli/cmd/run/_types.ts
  • packages/cli/src/cli/cmd/run/execute.ts
  • packages/cli/src/cli/cmd/run/frozen.ts
  • packages/cli/src/cli/cmd/run/plan.ts
  • packages/cli/src/cli/cmd/show/_shared-key-command.ts
  • packages/cli/src/cli/cmd/status.ts
  • packages/cli/src/cli/loaders/csv.spec.ts
  • packages/cli/src/cli/loaders/csv.ts
  • packages/cli/src/cli/loaders/index.ts
  • packages/cli/src/cli/utils/buckets.spec.ts
  • packages/cli/src/cli/utils/buckets.ts
  • packages/spec/src/config.ts

Comment thread packages/cli/src/cli/loaders/csv.ts Outdated
Comment thread packages/cli/src/cli/loaders/csv.ts Outdated
Comment on lines +67 to +79
if (bucketEntry.keyColumn) {
if (bucketType !== "csv") {
if (!warnedKeyColumnTypes.has(bucketType)) {
warnedKeyColumnTypes.add(bucketType);
console.warn(
`Warning: "keyColumn" is only supported on "csv" buckets, but was set on "${bucketType}". ` +
`The setting will be ignored. Remove it from this bucket's config to silence this warning.`,
);
}
} else {
config.keyColumn = bucketEntry.keyColumn;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle empty-string keyColumn explicitly instead of relying on truthiness.

At Line 67, using a truthy check means keyColumn: "" is silently ignored (no warning for non-CSV and no propagation for CSV), which can hide bad config.

Suggested fix
-      if (bucketEntry.keyColumn) {
+      if (bucketEntry.keyColumn !== undefined) {
         if (bucketType !== "csv") {
           if (!warnedKeyColumnTypes.has(bucketType)) {
             warnedKeyColumnTypes.add(bucketType);
             console.warn(
               `Warning: "keyColumn" is only supported on "csv" buckets, but was set on "${bucketType}". ` +
                 `The setting will be ignored. Remove it from this bucket's config to silence this warning.`,
             );
           }
         } else {
           config.keyColumn = bucketEntry.keyColumn;
         }
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/utils/buckets.ts` around lines 67 - 79, Replace the
truthy check "if (bucketEntry.keyColumn)" with an explicit presence check (e.g.,
"if ('keyColumn' in bucketEntry)" or "if (typeof bucketEntry.keyColumn !==
'undefined')") so an empty string is handled deliberately; then keep the
existing branches using bucketType, warnedKeyColumnTypes, and config.keyColumn
so that empty-string values are propagated for CSV buckets (config.keyColumn =
bucketEntry.keyColumn) and still trigger the warning for non-CSV buckets instead
of being silently ignored.

cherkanovart and others added 2 commits April 23, 2026 18:36
…umn validation

- Use split(/\r?\n/) with trim check instead of non-null assertion for empty CSV safety
- Parse header row separately (to_line: 1) for column validation to avoid
  missing columns when first data row is sparse with relax_column_count_less

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants