Skip to content

feat: Add CDF/HDF5 and DataOrg file support (#72)#73

Open
sahiljhawar wants to merge 7 commits into
mainfrom
update-rbmdataset
Open

feat: Add CDF/HDF5 and DataOrg file support (#72)#73
sahiljhawar wants to merge 7 commits into
mainfrom
update-rbmdataset

Conversation

@sahiljhawar
Copy link
Copy Markdown
Collaborator

  • Add new readers for HDF5 and CDF and switch to monthly dataset handling.
  • Introduce dataorg flag and extend preferred_extension to include cdf and h5'.
  • Replace prints/warnings with logging, add per-format caching, and enforce validation rules for dataorg vs pickle usage.

- Add new readers for HDF5 and CDF and switch to monthly dataset handling.
- Introduce dataorg flag and extend preferred_extension to include `cdf`
and `h5`'.
- Replace prints/warnings with logging, add per-format caching, and enforce validation rules for dataorg vs pickle usage.
Copilot AI review requested due to automatic review settings May 7, 2026 12:04
@sahiljhawar sahiljhawar linked an issue May 7, 2026 that may be closed by this pull request
@sahiljhawar sahiljhawar marked this pull request as draft May 7, 2026 12:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends RBMDataSet’s file-loading capabilities toward “monthly dataset” files by adding HDF5/CDF reader plumbing, introducing a dataorg mode flag, broadening allowed preferred_extension values, and migrating user-facing output from print/warnings to logging.

Changes:

  • Add HDF5 dataset traversal/reading helper and wire monthly-file caching for .nc, .h5, and .cdf into RBMDataSet.
  • Introduce dataorg flag and tighten validation rules for preferred_extension vs dataorg combinations.
  • Replace several print/warnings.warn call sites with logging.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
swvo/io/RBMDataSet/utils.py Adds logger usage and introduces HDF5/CDF dataset reader helpers alongside existing NetCDF support.
swvo/io/RBMDataSet/RBMDataSet.py Extends extension validation, adds dataorg/monthly-mode selection, and adds per-month caching loaders for .nc/.h5/.cdf.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread swvo/io/RBMDataSet/utils.py Outdated
Comment thread swvo/io/RBMDataSet/utils.py Outdated
Dict[str, Any]: A dictionary where keys are the full variable paths
and values are the corresponding NumPy arrays.
"""
pass
Comment thread swvo/io/RBMDataSet/RBMDataSet.py
Comment thread swvo/io/RBMDataSet/RBMDataSet.py
Comment thread swvo/io/RBMDataSet/RBMDataSet.py
Comment thread swvo/io/RBMDataSet/utils.py
- Log and raise clearer AttributeError when accessing unset VariableLiteral attributes and hint to call `update_from_dict()`; add special guidance for `custom`
- Treat ".mat" time variables as POSIX timestamps for monthly datasets and log the assumption
- Map variables prefixed with "custom/" into the `custom` dict when loading NetCDF/HDF5/CDF/mat files
- Add "# ty:ignore" annotations to silence type-checker warnings in interpolation and script code
- Implement CDF reader using `cdflib` to load zVariables and rVariables; warn and return empty dict if file is missing
@sahiljhawar sahiljhawar marked this pull request as ready for review May 7, 2026 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

update RBMDataSet

2 participants