Skip to content

blockfeed/llama-cpp-mtp-hip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

llama-cpp-mtp-hip

Arch Linux PKGBUILD for llama.cpp with MTP speculative decoding, using the HIP/ROCm backend for AMD GPUs. This package was created to support the new generation of MTP-capable models as introduced in llama.cpp PR #22673.

What is MTP?

Multi-Token Prediction (MTP) lets a draft model guess several tokens at once and the base model verify them in parallel. On the right hardware it speeds up inference significantly. Upstream support was added in PR #22673. This package builds the mtp-clean branch by am17an, which provides the complete MTP speculative decoding pipeline for ROCm.

Requirements

  • Arch Linux (or a derivative)
  • ROCm packages: hip-runtime-amd, hipblas, rocblas
  • An AMD GPU with ROCm support

Build

makepkg -si

License

The PKGBUILD and supporting files in this repository are GPLv3. The upstream llama.cpp source it builds is MIT — see the llama.cpp repository for details.

About

Arch Linux PKGBUILD for llama.cpp with MTP speculative decoding (HIP/ROCm)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages