GPU Normalize Audio for Tdarr

USE AT YOUR OWN RISK

THIS IS WIP

IT MIGHT BURN YOUR HOUSE ;-)

TLDR

GPU Normalize Audio is a Tdarr FlowPlugin that tries to move FFmpeg loudnorm-style audio normalization onto the GPU while matching Tdarr's normal CPU-only Normalize Audio output.

What it does today:

Normalizes every audio stream in a file, not only the first one.
Keeps video, subtitles, chapters, metadata, attachments, and data streams.
Uses CUDA for the loudness stats/apply path and streams decode/encode through FFmpeg.
Avoids huge raw PCM bridge files for normal gpuSourcePort jobs.
Keeps decoded audio parity as the top priority, even when that costs speed.
Release smoke testing covers multiple input codec/layout cases, including AAC stereo, MP3 stereo, AC3 5.1, E-AC-3 5.1, DTS 5.1, and multi-audio inputs.

What we are trying to do:

Match CPU Normalize Audio output first.
Make the GPU path faster over time without cheating parity.
Keep improving long-media speed; the current release beats CPU on tested long combined jobs because the original 5.1 normalization is faster, while generated 2-channel fallback work is still slower by itself.

Plugin UI

Performance (i9 9900k @ 5ghz vs. Nvidia 1050 TI)

Latest release: v1.2.0.

1.2.0 keeps the streaming gpuSourcePort path and fixes generated-stereo fallback scope while preserving exact decoded parity against Tdarr CPU Normalize Audio. By default it normalizes every source audio stream, sorts audio streams by Track Order language priority (eng,en), and ensures the first prioritized language has a 2-channel track. If that language lacks stereo, it generates one normalized stereo fallback from the best same-language source, preferring 5.1/6-channel over 7.1/8-channel. Disable Only Generate 2-Channel For First Language to create generated 2-channel fallback tracks for every non-stereo audio stream/language.

The table below uses a 5.1 source that has no existing stereo track. The GPU job writes both the normalized original 5.1 stream and the generated normalized 2-channel fallback, then both decoded streams are compared against Tdarr CPU Normalize Audio output. SRC_CPU time is split the same way: the original CPU Normalize Audio job for the 5.1 stream, plus a CPU Normalize Audio reference for the generated 2-channel source.

Speed vs CPU uses measured SRC_CPU/GPU wall time: above 1.0x is faster than CPU, below 1.0x is slower than CPU.

60min TLDR: original 5.1 2.241x faster (1027.3s CPU vs 458.3s GPU), generated 2ch 0.674x slower (291.2s CPU vs 432.0s GPU), combined job 1.474x faster (1318.5s CPU vs 894.2s GPU), parity pass for both decoded streams.

60min

Workload	CPU reference	GPU `1.2.0`	Speed vs CPU
Original 5.1	`1027.3s`	`458.3s`	`2.241x`
Generated 2ch	`291.2s`	`432.0s`	`0.674x`
Combined	`1318.5s`	`894.2s`	`1.474x`

30min

Workload	CPU reference	GPU `1.2.0`	Speed vs CPU
Original 5.1	`540.3s`	`231.2s`	`2.337x`
Generated 2ch	`155.5s`	`219.1s`	`0.710x`
Combined	`695.8s`	`452.3s`	`1.539x`

10min

Workload	CPU reference	GPU `1.2.0`	Speed vs CPU
Original 5.1	`168.6s`	`79.0s`	`2.133x`
Generated 2ch	`49.9s`	`76.1s`	`0.656x`
Combined	`218.5s`	`155.7s`	`1.403x`

60s

Workload	CPU reference	GPU `1.2.0`	Speed vs CPU
Original 5.1	`18.3s`	`9.7s`	`1.894x`
Generated 2ch	`5.0s`	`11.3s`	`0.441x`
Combined	`23.3s`	`21.2s`	`1.102x`

30s

Workload	CPU reference	GPU `1.2.0`	Speed vs CPU
Original 5.1	`9.8s`	`6.5s`	`1.504x`
Generated 2ch	`2.6s`	`6.8s`	`0.384x`
Combined	`12.4s`	`13.4s`	`0.926x`

Short clips are still slower because fixed startup, decode/encode, and stream setup overhead dominate. Longer media is the intended target and is where the GPU path now catches and passes CPU on the tested 5.1 plus generated-2ch workload.

Codec smoke coverage for 1.2.0 also passed exact decoded parity for AAC stereo, MP3 stereo, AC3 5.1, E-AC-3 5.1, DTS 5.1, and a multi-audio sample. The multi-audio smoke compares each GPU-normalized source stream against Tdarr CPU Normalize Audio run on an isolated source for that same stream, matching this plugin's per-stream normalization behavior.

2-channel fallback behavior:

Setting	Default	Behavior
`Add Generated 2-Channel Track`	`true`	Adds normalized generated stereo track(s) according to the selected 2-channel scope.
`Track Order`	`eng,en`	Comma-separated language priority used before choosing the generated 2-channel source. The default prefers English, while unlisted languages keep source order after listed languages. This uses source metadata and cannot fix missing or wrong language tags.
`Only Generate 2-Channel For First Language`	`true`	Ensures the first `Track Order` language has stereo. If that language lacks stereo, creates one generated track from the best same-language source, preferring 5.1/6-channel over 7.1/8-channel. Disable it to create one for every non-stereo audio stream/language.

Compared with releases before 1.1.11, the default tested contract when Add Generated 2-Channel Track=true includes generated 2-channel fallback work, and parity is checked for both the original audio stream and the generated fallback. Older release rows below used the previous primary-stream speed table, so treat the timing comparison as release history rather than a strict same-workload benchmark.

Version	60s GPU Time	10min GPU Time	30min GPU Time	Exact Stats Time	Exact Apply Time
`1.2.0`	`21.2s`	`155.7s`	`452.3s`	`261.5s`	`138.1s`
`1.1.13`	`21.5s`	`155.3s`	`456.8s`	`262.2s`	`138.7s`
`1.1.12`	`29.0s`	`196.5s`	`568.4s`	`222.0s`	`169.5s`
`1.1.11`	`30.8s`	`217.2s`	`634.9s`	`252.9s`	`162.8s`
`1.1.10`	`15.2s`	`114.9s`	`334.0s`	`108.5s`	`97.9s`
`1.1.9`	`20.0s`	`139.4s`	`403.0s`	`114.9s`	`112.8s`
`1.1.8`	`27.5s`	`234.2s`	`712.1s`	`114.9s`	`363.3s`
`1.1.7`	`31.2s`	`281.1s`	`841.4s`	`114.9s`	`515.2s`
`1.1.6`	`36.5s`	`no data`	`964.1s`	`114.8s`	`615.0s`
`1.1.5`	`50.1s`	`no data`	`1513.4s`	`114.9s`	`1158.5s`
`1.1.4`	`49.9s`	`no data`	`1507.6s`	`114.8s`	`1158.3s`

Choose 1.2.0 for current parity and corrected generated-stereo first-language behavior. Use 1.1.13 only if you need the previous fallback scope.

Install

Download the latest GitHub release zip and extract it into your Tdarr plugins folder.

Keep the version folder. Do not flatten it.

Correct layout:

FlowPlugins/CommunityFlowPlugins/audio/gpuNormalizeAudio/1.2.0/

Tdarr loads:

FlowPlugins/CommunityFlowPlugins/audio/gpuNormalizeAudio/1.2.0/index.js

Defaults:

Add Generated 2-Channel Track=true
Track Order=eng,en
Only Generate 2-Channel For First Language=true
Max Concurrent Jobs=1
Audio Bitrate=192k
Integrated Loudness I=-18.0
Loudness Range LRA=7.0
True Peak TP=-2.0
Max Gain dB=15
Debug Logging=false

Versions

Version	Use
`1.2.0`	Current release. Fixes generated-stereo first-language scope, keeps streaming `gpuSourcePort`, and matches CPU decoded output for normalized 5.1 plus generated 2-channel fallback on the required matrix.
`1.1.13`	Previous release. Speeds up long generated-stereo fallback jobs again, keeps streaming `gpuSourcePort`, and matches CPU decoded output for normalized 5.1 plus generated 2-channel fallback on the required matrix.
`1.1.12`	Older release. Speeds up long generated-stereo fallback jobs, keeps streaming `gpuSourcePort`, and matches CPU decoded output for normalized 5.1 plus generated 2-channel fallback on the required matrix.
`1.1.11`	Keeps streaming `gpuSourcePort`, matches CPU decoded output for normalized 5.1 plus generated 2-channel fallback on tested long media, and adds language-priority controls for first-language versus all-language 2-channel fallback generation.
`1.1.10`	Uses smaller streaming chunks, stats decode prefetch, and safe unsafe-output-feedback skipping after threshold lock; required parity matrix passed. Faster than CPU on all tested required limiter-heavy primary-stream cases.
`1.1.9`	Parallelizes safe feedback-skip apply windows while preserving exact state and decoded parity; required parity matrix passed. Faster than CPU on tested 10min/30min limiter-heavy media.
`1.1.8`	Skips safe output-feedback accumulation after the feedback threshold is active while preserving normal limiter/output rendering; required parity matrix passed. Still slower than CPU on long limiter-heavy media.
`1.1.7`	Slot-accumulated safe feedback, parallel unsafe feedback, smaller chunks, and asymmetric risk planning speed up limiter-heavy 5.1 jobs; required parity matrix passed. Still slower than CPU on long limiter-heavy media.
`1.1.6`	Segmented exact safe feedback speeds up limiter-heavy 5.1 jobs, required parity matrix passed. Still slower than CPU on long limiter-heavy media.
`1.1.5`	Modularized GPU runtime, required parity matrix passed. Performance is effectively unchanged from `1.1.4`; still slower than CPU on long limiter-heavy media.
`1.1.4`	Faster exact stats and limiter-active GPU path, required parity matrix passed. Still slower than CPU on long limiter-heavy media.
`1.1.3`	Faster limiter-active GPU path, required parity matrix passed. Still slower than CPU on long limiter-heavy media.
`1.1.2`	Streaming two-pass release. Avoids huge raw PCM bridge files and fixed long-case parity.
`1.1.1`	Added guarded GPU normalize concurrency.
`1.1`	Older optimized exact GPU line.
`1.0`	First stable CPU-output matching line.
`0.0.x`	Old pre-stable snapshots. Some are known not to match CPU normalizer output. Use only for rollback/debug.

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
FlowPlugins/CommunityFlowPlugins/audio/gpuNormalizeAudio		FlowPlugins/CommunityFlowPlugins/audio/gpuNormalizeAudio
release-notes		release-notes
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Normalize Audio for Tdarr

TLDR

Plugin UI