fix: make --xml-root and --xml-row work for XML input parsing#140
Merged
fix: make --xml-root and --xml-row work for XML input parsing#140
Conversation
Previously these flags only controlled XML output element names. The parser always used the actual document root and accepted any element as a row, making it impossible to query nested structures like RSS feeds (<rss> → <channel> → <item>). Changes: - Add XmlParser.skipElementBody() to skip a complete element tree - Add XmlParser.navigateToRoot() to descend into a named container - Add row_tag_filter to XmlParser.nextRow() to skip non-matching elements - Update loadXmlInput, getXmlColumnNames, summarizeXml to accept optional xml_root and xml_row parameters (null = legacy behaviour) - Thread xml_root_input / xml_row_input from CLI args through ParsedArgs, ColumnsArgs, and ValidateArgs to all call sites - Fix test_xml_no_rows to use <results> root (consistent with default) - Add integration tests 114 and 115 for nested navigation Closes #139
- skipElementBody: validate closing tag name against expected tag (was discarding it)
- skipElementBody: replace fragile peek().? pattern with safe orelse-break idiom
- navigateToRoot: skip text nodes between siblings instead of fataling
- navigateToRoot: improve error message to say 'not found as a direct child of'
- getXmlColumnNames/summarizeXml/loadXmlInput: emit tailored error when --xml-row
filter matches nothing ('check --xml-row value') vs generic 'no row elements'
- Unit tests: skipElementBody (body skip, nested elements), navigateToRoot
(fast path, nested child, text nodes), nextRow with row_tag_filter
- Integration tests 116-120: --xml-root alone, --xml-row alone, filter-no-match
error, --columns with both flags, fast path when --xml-root equals actual root
- Fix test 104 comment (output-only, not input)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--xml-rootand--xml-rowwere silently ignored when parsing XML input — they only controlled output element namesnavigateToRoot) and row filtering (nextRowfilter) to the XML parser, and threads the flags through all input code pathsChanges
src/xml.zigXmlParser.skipElementBody()— new helper that skips a complete element tree (content + closing tag), handling nesting, CDATA, commentsXmlParser.navigateToRoot(xml_root)— if the actual document root matchesxml_root, proceeds directly; otherwise scans direct children of the actual root until the target is found, skipping non-matching siblingsXmlParser.nextRow(..., row_tag_filter)— new optional param; wraps the read loop so non-matching elements are skipped viaskipElementBodyloadXmlInput,getXmlColumnNames,summarizeXml— all acceptxml_root: ?[]const u8andxml_row: ?[]const u8(null= legacy behaviour)src/main.zigColumnsArgs,ValidateArgs,ParsedArgs— each gainsxml_root_input: ?[]const u8andxml_row_input: ?[]const u8--xml-root/--xml-roware explicitly provided they flow to all three XML input functions; defaults leave both asnull(no change in behaviour)build.zigtest_xml_no_rows: updated to use<results></results>(consistent with defaultxml_root = "results")--xml-root/--xml-rowon an RSS-like document--validatewith--xml-root/--xml-rowcounts only matching rowsVerification
Live test with the originally failing command:
Closes #139