Skip to content

feat: add XML input/output format support#133

Merged
vmvarela merged 2 commits intomasterfrom
issue-99/xml-format
May 7, 2026
Merged

feat: add XML input/output format support#133
vmvarela merged 2 commits intomasterfrom
issue-99/xml-format

Conversation

@vmvarela
Copy link
Copy Markdown
Owner

@vmvarela vmvarela commented May 7, 2026

Summary

  • Adds xml as a supported input and output format (-I xml / -O xml)
  • New src/xml.zig: hand-written row-based XML parser and writer (no external XML library)
  • New flags --xml-root and --xml-row to customise root/row element names (defaults: results / row)
  • XML supported in --columns, --validate; rejected with a clear error in --sample

Format

Output (-O xml):

<?xml version="1.0" encoding="UTF-8"?>
<results>
<row><name>Alice</name><age>30</age></row>
</results>

Input (-I xml): row-based XML where each direct child of the root element is a row and each child of a row element is a column. Entities decoded (&amp; &lt; &gt; &quot; &apos;). Nested elements captured as raw XML strings.

Changes

  • src/xml.zig (new): writeXmlHeader, writeXmlRow, writeXmlFooter, XmlParser, loadXmlInput, getXmlColumnNames, summarizeXml
  • src/main.zig: xml added to InputFormat/OutputFormat enums; --xml-root/--xml-row flag parsing; dispatch in run(), runColumns(), runValidate(), runSample()
  • build.zig: tests 57/58 updated (use parquet as the truly unknown format now); 6 new XML integration tests (99–104)
  • docs/sql-pipe.1.scd + README.md: updated format lists, new flag docs, XML usage example

Closes #99

- New src/xml.zig: row-based XML parser and writer
  - writeXmlHeader/writeXmlRow/writeXmlFooter for output
  - XmlParser struct for input (line/col error tracking, entity decoding)
  - loadXmlInput, getXmlColumnNames, summarizeXml for all three modes
- main.zig: xml added to InputFormat and OutputFormat enums
  - --xml-root and --xml-row flags to customise element names (defaults: results, row)
  - XML dispatch in run(), runColumns(), runValidate(), runSample() (fatal)
- build.zig: tests 57/58 updated to use parquet as unknown format; 6 new XML integration tests (99-104)
- docs and README: updated format lists, new --xml-root/--xml-row flag docs, XML usage example
@github-actions github-actions Bot added the type:feature New functionality label May 7, 2026
- xml.zig: fix readName to reject digits as NameStartChar (XML spec)
- xml.zig: add numeric character reference support in decodeEntities (&#NNN; / &#xNNN;)
- xml.zig: add 13 unit tests for parser, escaping and output functions
- main.zig: add MissingXmlFlagValue and InvalidXmlName errors with clear messages
- main.zig: validate --xml-root/--xml-row as legal XML names
- main.zig: simplify --sample rejection message
- build.zig: add 9 integration tests covering edge cases (entities, NULL, empty doc, --sample, self-closing, column order, attributes, float-as-int)
- build.zig: wire xml unit tests into test_step and unit-test step
- sqlite.zig: extract shared SQLite helpers from json.zig and xml.zig (DRY)
- sqlite.zig: remove dead c.sqlite3_free call in commitTransaction
@vmvarela vmvarela merged commit 76981e3 into master May 7, 2026
3 of 4 checks passed
@vmvarela vmvarela deleted the issue-99/xml-format branch May 7, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:feature New functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

XML input and output format support

1 participant