Skip to content

Feature/update to dbcan3#500

Merged
madeline-scyphers merged 6 commits into
devfrom
feature/update-to-dbcan3
Apr 30, 2026
Merged

Feature/update to dbcan3#500
madeline-scyphers merged 6 commits into
devfrom
feature/update-to-dbcan3

Conversation

@madeline-scyphers
Copy link
Copy Markdown
Member

feat: Update dbcan to dbcan3 using run_dbcan tool

Using the run_dbcan tooling, update our use of dbcan from
dbcan2 to dbcan3. We will use the easysubstrate call to run the
entire run_dbcan pipeline. This initial step just consumes the first
stages output and does not include the CGC or easysubstrate in our
annotation or summarize.

Add parsing for run_dbcan output to incorporate into raw-annotations.tsv

Add ability for dram to check DB version with added version file. This
is an optional, per database add-on that is currently only being used
with dbcan to ensure users are updated to dbcan3.

Using the run_dbcan tooling, update our use of dbcan from
dbcan2 to dbcan3. We will use the easysubstrate call to run the
entire run_dbcan pipeline. This initial step just consumes the first
stages output and does not include the CGC or easysubstrate in our
annotation or summarize.

Add parsing for run_dbcan output to incorporate into raw-annotations.tsv

Add ability for dram to check DB version with added version file. This
is an optional, per database add-on that is currently only being used
with dbcan to ensure users are updated to dbcan3.
@madeline-scyphers madeline-scyphers added enhancement New feature or request database Anything to do with the database formatting, downloading, etc. labels Apr 29, 2026
@github-project-automation github-project-automation Bot moved this to To Sort in DRAM Apr 29, 2026
@madeline-scyphers madeline-scyphers merged commit aa1ec17 into dev Apr 30, 2026
2 checks passed
@github-project-automation github-project-automation Bot moved this from To Sort to Done in DRAM Apr 30, 2026
@madeline-scyphers madeline-scyphers deleted the feature/update-to-dbcan3 branch April 30, 2026 00:46
tpall added a commit to tpall/DRAM that referenced this pull request May 10, 2026
Local pipeline run on a single OWC fasta with only --use_dbcan exposed two
issues that the original dbcan2 path masked because SQL_DBCAN always
populated formattedOutputchannels.

1. db_search.nf: .collect() on an empty formattedOutputchannels never
   emits, so COMBINE_ANNOTATIONS hung waiting for input. Switched the
   three input channels (fastas, genes, dbcan_output) to .toList(), which
   always emits a list -- matches upstream PR WrightonLabCSU#500's pattern.

2. combine_annotations.py: pd.concat([]) raises ValueError. Two cases:
   - annotations dir empty (no other DB enabled) -> seed combined_data with
     an empty DataFrame carrying just query_id and input_fasta columns.
   - dbcan TSVs all header-only (zero CAZyme hits in the sample) ->
     filter the per-fasta frames first, only concat if any non-empty
     remain.

Verified end-to-end: nextflow run . -profile docker --call --annotate
--use_dbcan against the OWC_0000 fixture (9 proteins, 0 CAZymes) now
completes and produces raw-annotations.tsv with the expected gene rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

database Anything to do with the database formatting, downloading, etc. enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant