feat(cli): add keyColumn option for CSV buckets with duplicate key validation by cherkanovart · Pull Request #2076 · lingodotdev/lingo.dev

cherkanovart · 2026-04-23T16:22:16Z

Summary

Adds a keyColumn config option for CSV buckets so users can specify which column to use as the unique row identifier, and introduces duplicate key validation to prevent silent data loss when the key column has repeated values.

Changes

Added keyColumn optional field to bucket config schema (packages/spec)
CSV loader now accepts keyColumn option; falls back to first column detection when omitted
Added assertUniqueKeys() validation that throws a descriptive error when the key column has duplicate values (previously rows would silently overwrite each other)
Added runtime warning when keyColumn is set on a non-csv bucket type (ignored with message)
Threaded keyColumn through all 12 command call sites (cleanup, i18n, lockfile, purge, status, show, run/plan, run/execute, run/frozen)

Testing

Business logic tests added:

Throws descriptive error when key column has duplicate values (mirrors Salesfloor bug)
No false positive when detected key column is unique
keyColumn option overrides first-column detection and uses specified column
Throws when keyColumn points to a column not present in the CSV
Throws on duplicates even when keyColumn explicitly points at a non-unique column
getBuckets() correctly passes keyColumn for csv buckets
getBuckets() warns and ignores keyColumn on non-csv buckets
All tests pass locally

Checklist

Changeset added (if version bump needed)
Tests cover business logic (not just happy path)
No breaking changes (or documented below)

Note: Users with CSV files where the first column has duplicate values will now get an error instead of silent data loss. The error message explains how to fix it (add a unique column or set keyColumn).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Add keyColumn for CSV buckets so you can choose the column that uniquely identifies rows; loaders now map translations by that key to preserve associations.
- Add upfront uniqueness validation to prevent silent overwrites when duplicate keys exist.
Bug Fixes
- CLI warns and ignores keyColumn for non‑CSV bucket types to avoid misconfiguration.
Tests
- Add tests for keyColumn behavior, validation, error cases, and round‑trip integrity.

…mmands

coderabbitai · 2026-04-23T16:22:44Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 95d5636f-a3dc-4d63-ac74-15190b5ce13a

📥 Commits

Reviewing files that changed from the base of the PR and between c046a87 and 53599f3.

📒 Files selected for processing (1)

packages/cli/demo/csv/i18n.json

✅ Files skipped from review due to trivial changes (1)

packages/cli/demo/csv/i18n.json

📝 Walkthrough

Walkthrough

Adds an optional keyColumn bucket option for CSV buckets, validates header presence and uniqueness in the CSV loader (errors on duplicates/missing key), updates schema and demo config, threads keyColumn through bucket parsing and all CLI commands that construct loaders, and adds tests for behavior.

Changes

Cohort / File(s)	Summary
Changeset & Schema `/.changeset/csv-key-column.md`, `packages/spec/src/config.ts`	Adds a changeset and introduces `keyColumn?: string` to v1.3 bucket schema.
Bucket config parsing & warnings `packages/cli/src/cli/utils/buckets.ts`, `packages/cli/src/cli/utils/buckets.spec.ts`	`getBuckets` accepts `keyColumn` for `csv` buckets, warns (deduplicated) when present on non-CSV types; tests cover retention and warning.
CSV loader implementation & tests `packages/cli/src/cli/loaders/csv.ts`, `packages/cli/src/cli/loaders/csv.spec.ts`	`createCsvLoader(options?: CsvLoaderOptions)` added with `keyColumn?: string`; validates header existence and per-instance uniqueness; errors on duplicate keys; tests added for detection, config, and push/pull round-trip by key.
Loader composition & types `packages/cli/src/cli/loaders/index.ts`, `packages/cli/src/cli/cmd/run/_types.ts`	`BucketLoaderOptions` gains `keyColumn?: string`; CSV loader constructed with `{ keyColumn: options.keyColumn }`; `CmdRunTask` extended with optional `keyColumn`.
Task planning & execution `packages/cli/src/cli/cmd/run/plan.ts`, `packages/cli/src/cli/cmd/run/execute.ts`	Tasks populated with `keyColumn`; several task title strings condensed to single-line templates; execution forwards `keyColumn` into `createBucketLoader`.
CLI command integrations `packages/cli/src/cli/cmd/i18n.ts`, `.../lockfile.ts`, `.../cleanup.ts`, `.../purge.ts`, `.../run/frozen.ts`, `.../show/_shared-key-command.ts`, `.../status.ts`	All command entry points that build bucket loaders now forward `bucket.keyColumn` in the loader options for consistent CSV key handling.
Demo config `packages/cli/demo/csv/i18n.json`	Adds `buckets.csv.keyColumn: "KEY"` to demo configuration.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as CLI Command
  participant Buckets as getBuckets
  participant LoaderFactory as createBucketLoader
  participant CSVLoader as createCsvLoader
  participant FS as CSV File

  CLI->>Buckets: read bucket configs (includes keyColumn)
  Buckets-->>CLI: bucket objects (keyColumn present for CSV)
  CLI->>LoaderFactory: request loader with { keyColumn }
  LoaderFactory->>CSVLoader: instantiate with { keyColumn }
  CSVLoader->>FS: read CSV (header + rows)
  CSVLoader->>CSVLoader: validate header contains keyColumn
  CSVLoader->>CSVLoader: ensure unique non-empty key values
  alt duplicates found
    CSVLoader-->>CLI: throw validation error (duplicate keys)
  else unique
    CSVLoader-->>CLI: return translations mapped by keyColumn
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat(cli): add preservedKeys support to all commands #1962: Modifies the same CLI loader initialization sites to propagate a per-bucket option into createBucketLoader.

Suggested reviewers

vrcprl

Poem

🐰 I nibble headers, sniff the column fine,
I tally keys like carrots on a vine,
No duplicate hides where my whiskers go,
I hop from pull to push with steady flow,
Each CSV now has a name I know! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(cli): add keyColumn option for CSV buckets with duplicate key validation' clearly and concisely describes the main change—adding a keyColumn configuration option with validation—matching the core changeset.
Description check	✅ Passed	The PR description comprehensively covers all required sections: summary explains the feature, changes lists the implementation across spec/loader/commands, testing documents the business-logic tests with checkmarks, and the checklist confirms tests pass and no breaking changes (with context provided).
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/csv-key-column

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/cli/src/cli/loaders/csv.ts`:
- Around line 47-49: The code currently calls detectKeyColumnName on
input.split("\n").find((l) => l.length)! which can be undefined for empty or
whitespace-only CSVs; change this to first locate a non-blank header line (e.g.,
find((l) => l.trim().length > 0)), validate that result before calling
detectKeyColumnName, and if no header line is found handle it explicitly (either
throw a clear error like "empty CSV input" or use a sensible default) while
keeping the options?.keyColumn fallback; update the assignment of keyColumnName
to use this validated header line when calling detectKeyColumnName.
- Around line 57-64: The validation currently builds availableColumns from
Object.keys(inputParsed[0]) which can miss header columns when the first data
row is sparse; instead parse the CSV header row separately (e.g., call the CSV
parser with to_line: 1 or otherwise extract parser.metadata.fields) and use that
header array to populate availableColumns for the options.keyColumn check;
update the logic around options?.keyColumn and the availableColumns variable in
the loader function (the code that throws the Error) to use the header-derived
columns rather than Object.keys(inputParsed[0]).

In `@packages/cli/src/cli/utils/buckets.ts`:
- Around line 67-79: Replace the truthy check "if (bucketEntry.keyColumn)" with
an explicit presence check (e.g., "if ('keyColumn' in bucketEntry)" or "if
(typeof bucketEntry.keyColumn !== 'undefined')") so an empty string is handled
deliberately; then keep the existing branches using bucketType,
warnedKeyColumnTypes, and config.keyColumn so that empty-string values are
propagated for CSV buckets (config.keyColumn = bucketEntry.keyColumn) and still
trigger the warning for non-CSV buckets instead of being silently ignored.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8fa0d635-dd64-419d-a95f-62f6a7a83113

📥 Commits

Reviewing files that changed from the base of the PR and between 873e773 and eab1c89.

📒 Files selected for processing (17)

.changeset/csv-key-column.md
packages/cli/src/cli/cmd/cleanup.ts
packages/cli/src/cli/cmd/i18n.ts
packages/cli/src/cli/cmd/lockfile.ts
packages/cli/src/cli/cmd/purge.ts
packages/cli/src/cli/cmd/run/_types.ts
packages/cli/src/cli/cmd/run/execute.ts
packages/cli/src/cli/cmd/run/frozen.ts
packages/cli/src/cli/cmd/run/plan.ts
packages/cli/src/cli/cmd/show/_shared-key-command.ts
packages/cli/src/cli/cmd/status.ts
packages/cli/src/cli/loaders/csv.spec.ts
packages/cli/src/cli/loaders/csv.ts
packages/cli/src/cli/loaders/index.ts
packages/cli/src/cli/utils/buckets.spec.ts
packages/cli/src/cli/utils/buckets.ts
packages/spec/src/config.ts

coderabbitai · 2026-04-23T16:30:31Z

+      if (bucketEntry.keyColumn) {
+        if (bucketType !== "csv") {
+          if (!warnedKeyColumnTypes.has(bucketType)) {
+            warnedKeyColumnTypes.add(bucketType);
+            console.warn(
+              `Warning: "keyColumn" is only supported on "csv" buckets, but was set on "${bucketType}". ` +
+                `The setting will be ignored. Remove it from this bucket's config to silence this warning.`,
+            );
+          }
+        } else {
+          config.keyColumn = bucketEntry.keyColumn;
+        }
+      }


⚠️ Potential issue | 🟡 Minor

Handle empty-string keyColumn explicitly instead of relying on truthiness.

At Line 67, using a truthy check means keyColumn: "" is silently ignored (no warning for non-CSV and no propagation for CSV), which can hide bad config.

Suggested fix

- if (bucketEntry.keyColumn) { + if (bucketEntry.keyColumn !== undefined) { if (bucketType !== "csv") { if (!warnedKeyColumnTypes.has(bucketType)) { warnedKeyColumnTypes.add(bucketType); console.warn( `Warning: "keyColumn" is only supported on "csv" buckets, but was set on "${bucketType}". ` + `The setting will be ignored. Remove it from this bucket's config to silence this warning.`, ); } } else { config.keyColumn = bucketEntry.keyColumn; } }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/cli/src/cli/utils/buckets.ts` around lines 67 - 79, Replace the truthy check "if (bucketEntry.keyColumn)" with an explicit presence check (e.g., "if ('keyColumn' in bucketEntry)" or "if (typeof bucketEntry.keyColumn !== 'undefined')") so an empty string is handled deliberately; then keep the existing branches using bucketType, warnedKeyColumnTypes, and config.keyColumn so that empty-string values are propagated for CSV buckets (config.keyColumn = bucketEntry.keyColumn) and still trigger the warning for non-CSV buckets instead of being silently ignored.

…umn validation - Use split(/\r?\n/) with trim check instead of non-null assertion for empty CSV safety - Parse header row separately (to_line: 1) for column validation to avoid missing columns when first data row is sparse with relax_column_count_less Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(cli): add keyColumn option for CSV loaders and update related co…

f08b1e1

…mmands

chore: add changeset for csv keyColumn feature

eab1c89

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

cherkanovart and others added 2 commits April 23, 2026 18:36

feat(cli): add keyColumn configuration to CSV i18n settings

53599f3

vrcprl approved these changes Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cli): add keyColumn option for CSV buckets with duplicate key validation#2076

feat(cli): add keyColumn option for CSV buckets with duplicate key validation#2076
cherkanovart wants to merge 4 commits intomainfrom
feat/csv-key-column

cherkanovart commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cherkanovart commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cherkanovart commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading