feat(cli): add keyColumn option for CSV buckets with duplicate key validation#2076
feat(cli): add keyColumn option for CSV buckets with duplicate key validation#2076cherkanovart wants to merge 4 commits intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI Command
participant Buckets as getBuckets
participant LoaderFactory as createBucketLoader
participant CSVLoader as createCsvLoader
participant FS as CSV File
CLI->>Buckets: read bucket configs (includes keyColumn)
Buckets-->>CLI: bucket objects (keyColumn present for CSV)
CLI->>LoaderFactory: request loader with { keyColumn }
LoaderFactory->>CSVLoader: instantiate with { keyColumn }
CSVLoader->>FS: read CSV (header + rows)
CSVLoader->>CSVLoader: validate header contains keyColumn
CSVLoader->>CSVLoader: ensure unique non-empty key values
alt duplicates found
CSVLoader-->>CLI: throw validation error (duplicate keys)
else unique
CSVLoader-->>CLI: return translations mapped by keyColumn
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cli/src/cli/loaders/csv.ts`:
- Around line 47-49: The code currently calls detectKeyColumnName on
input.split("\n").find((l) => l.length)! which can be undefined for empty or
whitespace-only CSVs; change this to first locate a non-blank header line (e.g.,
find((l) => l.trim().length > 0)), validate that result before calling
detectKeyColumnName, and if no header line is found handle it explicitly (either
throw a clear error like "empty CSV input" or use a sensible default) while
keeping the options?.keyColumn fallback; update the assignment of keyColumnName
to use this validated header line when calling detectKeyColumnName.
- Around line 57-64: The validation currently builds availableColumns from
Object.keys(inputParsed[0]) which can miss header columns when the first data
row is sparse; instead parse the CSV header row separately (e.g., call the CSV
parser with to_line: 1 or otherwise extract parser.metadata.fields) and use that
header array to populate availableColumns for the options.keyColumn check;
update the logic around options?.keyColumn and the availableColumns variable in
the loader function (the code that throws the Error) to use the header-derived
columns rather than Object.keys(inputParsed[0]).
In `@packages/cli/src/cli/utils/buckets.ts`:
- Around line 67-79: Replace the truthy check "if (bucketEntry.keyColumn)" with
an explicit presence check (e.g., "if ('keyColumn' in bucketEntry)" or "if
(typeof bucketEntry.keyColumn !== 'undefined')") so an empty string is handled
deliberately; then keep the existing branches using bucketType,
warnedKeyColumnTypes, and config.keyColumn so that empty-string values are
propagated for CSV buckets (config.keyColumn = bucketEntry.keyColumn) and still
trigger the warning for non-CSV buckets instead of being silently ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8fa0d635-dd64-419d-a95f-62f6a7a83113
📒 Files selected for processing (17)
.changeset/csv-key-column.mdpackages/cli/src/cli/cmd/cleanup.tspackages/cli/src/cli/cmd/i18n.tspackages/cli/src/cli/cmd/lockfile.tspackages/cli/src/cli/cmd/purge.tspackages/cli/src/cli/cmd/run/_types.tspackages/cli/src/cli/cmd/run/execute.tspackages/cli/src/cli/cmd/run/frozen.tspackages/cli/src/cli/cmd/run/plan.tspackages/cli/src/cli/cmd/show/_shared-key-command.tspackages/cli/src/cli/cmd/status.tspackages/cli/src/cli/loaders/csv.spec.tspackages/cli/src/cli/loaders/csv.tspackages/cli/src/cli/loaders/index.tspackages/cli/src/cli/utils/buckets.spec.tspackages/cli/src/cli/utils/buckets.tspackages/spec/src/config.ts
| if (bucketEntry.keyColumn) { | ||
| if (bucketType !== "csv") { | ||
| if (!warnedKeyColumnTypes.has(bucketType)) { | ||
| warnedKeyColumnTypes.add(bucketType); | ||
| console.warn( | ||
| `Warning: "keyColumn" is only supported on "csv" buckets, but was set on "${bucketType}". ` + | ||
| `The setting will be ignored. Remove it from this bucket's config to silence this warning.`, | ||
| ); | ||
| } | ||
| } else { | ||
| config.keyColumn = bucketEntry.keyColumn; | ||
| } | ||
| } |
There was a problem hiding this comment.
Handle empty-string keyColumn explicitly instead of relying on truthiness.
At Line 67, using a truthy check means keyColumn: "" is silently ignored (no warning for non-CSV and no propagation for CSV), which can hide bad config.
Suggested fix
- if (bucketEntry.keyColumn) {
+ if (bucketEntry.keyColumn !== undefined) {
if (bucketType !== "csv") {
if (!warnedKeyColumnTypes.has(bucketType)) {
warnedKeyColumnTypes.add(bucketType);
console.warn(
`Warning: "keyColumn" is only supported on "csv" buckets, but was set on "${bucketType}". ` +
`The setting will be ignored. Remove it from this bucket's config to silence this warning.`,
);
}
} else {
config.keyColumn = bucketEntry.keyColumn;
}
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/cli/src/cli/utils/buckets.ts` around lines 67 - 79, Replace the
truthy check "if (bucketEntry.keyColumn)" with an explicit presence check (e.g.,
"if ('keyColumn' in bucketEntry)" or "if (typeof bucketEntry.keyColumn !==
'undefined')") so an empty string is handled deliberately; then keep the
existing branches using bucketType, warnedKeyColumnTypes, and config.keyColumn
so that empty-string values are propagated for CSV buckets (config.keyColumn =
bucketEntry.keyColumn) and still trigger the warning for non-CSV buckets instead
of being silently ignored.
…umn validation - Use split(/\r?\n/) with trim check instead of non-null assertion for empty CSV safety - Parse header row separately (to_line: 1) for column validation to avoid missing columns when first data row is sparse with relax_column_count_less Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Adds a
keyColumnconfig option for CSV buckets so users can specify which column to use as the unique row identifier, and introduces duplicate key validation to prevent silent data loss when the key column has repeated values.Changes
keyColumnoptional field to bucket config schema (packages/spec)keyColumnoption; falls back to first column detection when omittedassertUniqueKeys()validation that throws a descriptive error when the key column has duplicate values (previously rows would silently overwrite each other)keyColumnis set on a non-csv bucket type (ignored with message)keyColumnthrough all 12 command call sites (cleanup, i18n, lockfile, purge, status, show, run/plan, run/execute, run/frozen)Testing
Business logic tests added:
keyColumnoption overrides first-column detection and uses specified columnkeyColumnpoints to a column not present in the CSVkeyColumnexplicitly points at a non-unique columngetBuckets()correctly passeskeyColumnfor csv bucketsgetBuckets()warns and ignoreskeyColumnon non-csv bucketsChecklist
Note: Users with CSV files where the first column has duplicate values will now get an error instead of silent data loss. The error message explains how to fix it (add a unique column or set
keyColumn).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests