Skip to content

fix: make --xml-root and --xml-row work for XML input parsing#140

Merged
vmvarela merged 2 commits intomasterfrom
issue-139/fix-xml-root-row-input
May 8, 2026
Merged

fix: make --xml-root and --xml-row work for XML input parsing#140
vmvarela merged 2 commits intomasterfrom
issue-139/fix-xml-root-row-input

Conversation

@vmvarela
Copy link
Copy Markdown
Owner

@vmvarela vmvarela commented May 8, 2026

Summary

  • --xml-root and --xml-row were silently ignored when parsing XML input — they only controlled output element names
  • The parser always used the document's actual root and accepted any child as a row, breaking nested structures like RSS feeds
  • This fix adds navigation (navigateToRoot) and row filtering (nextRow filter) to the XML parser, and threads the flags through all input code paths

Changes

src/xml.zig

  • XmlParser.skipElementBody() — new helper that skips a complete element tree (content + closing tag), handling nesting, CDATA, comments
  • XmlParser.navigateToRoot(xml_root) — if the actual document root matches xml_root, proceeds directly; otherwise scans direct children of the actual root until the target is found, skipping non-matching siblings
  • XmlParser.nextRow(..., row_tag_filter) — new optional param; wraps the read loop so non-matching elements are skipped via skipElementBody
  • loadXmlInput, getXmlColumnNames, summarizeXml — all accept xml_root: ?[]const u8 and xml_row: ?[]const u8 (null = legacy behaviour)

src/main.zig

  • ColumnsArgs, ValidateArgs, ParsedArgs — each gains xml_root_input: ?[]const u8 and xml_row_input: ?[]const u8
  • When --xml-root/--xml-row are explicitly provided they flow to all three XML input functions; defaults leave both as null (no change in behaviour)

build.zig

  • test_xml_no_rows: updated to use <results></results> (consistent with default xml_root = "results")
  • Added integration test 114: nested navigation with --xml-root/--xml-row on an RSS-like document
  • Added integration test 115: --validate with --xml-root/--xml-row counts only matching rows

Verification

zig build test   # all tests pass
ziglint src build.zig   # no warnings

Live test with the originally failing command:

curl -s "https://feeds.feedburner.com/TheHackersNews" \
  | sql-pipe -I xml --xml-root channel --xml-row item \
    'SELECT pubDate as Fecha, title as Noticia
     FROM t
     WHERE title LIKE "%Google%" OR title LIKE "%Apple%"
     ORDER BY Fecha DESC LIMIT 5'

Closes #139

Previously these flags only controlled XML output element names.
The parser always used the actual document root and accepted any
element as a row, making it impossible to query nested structures
like RSS feeds (<rss> → <channel> → <item>).

Changes:
- Add XmlParser.skipElementBody() to skip a complete element tree
- Add XmlParser.navigateToRoot() to descend into a named container
- Add row_tag_filter to XmlParser.nextRow() to skip non-matching elements
- Update loadXmlInput, getXmlColumnNames, summarizeXml to accept
  optional xml_root and xml_row parameters (null = legacy behaviour)
- Thread xml_root_input / xml_row_input from CLI args through
  ParsedArgs, ColumnsArgs, and ValidateArgs to all call sites
- Fix test_xml_no_rows to use <results> root (consistent with default)
- Add integration tests 114 and 115 for nested navigation

Closes #139
@vmvarela vmvarela added type:bug Something isn't working size:m Medium — 4 to 8 hours labels May 8, 2026
- skipElementBody: validate closing tag name against expected tag (was discarding it)
- skipElementBody: replace fragile peek().? pattern with safe orelse-break idiom
- navigateToRoot: skip text nodes between siblings instead of fataling
- navigateToRoot: improve error message to say 'not found as a direct child of'
- getXmlColumnNames/summarizeXml/loadXmlInput: emit tailored error when --xml-row
  filter matches nothing ('check --xml-row value') vs generic 'no row elements'
- Unit tests: skipElementBody (body skip, nested elements), navigateToRoot
  (fast path, nested child, text nodes), nextRow with row_tag_filter
- Integration tests 116-120: --xml-root alone, --xml-row alone, filter-no-match
  error, --columns with both flags, fast path when --xml-root equals actual root
- Fix test 104 comment (output-only, not input)
@vmvarela vmvarela merged commit e8cbd32 into master May 8, 2026
4 checks passed
@vmvarela vmvarela deleted the issue-139/fix-xml-root-row-input branch May 8, 2026 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:m Medium — 4 to 8 hours type:bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: --xml-root and --xml-row ignored for XML input (not passed to parser)

1 participant