-
Notifications
You must be signed in to change notification settings - Fork 16.8k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
ggml-webgpu: Command batching
ggml
changes relating to the ggml tensor library for machine learning
WebGPU
#21873
opened Apr 13, 2026 by
reeselevine
Contributor
•
Draft
ggml-webgpu: Fix dequantization helpers to not pass in pointers
ggml
changes relating to the ggml tensor library for machine learning
merge ready
A maintainer can use this label to indicate that they consider the changes final and ready to merge.
WebGPU
#21872
opened Apr 13, 2026 by
reeselevine
Contributor
Loading…
common: skip reasoning budget sampler when no budget is requested
#21870
opened Apr 13, 2026 by
berkidem
Contributor
Loading…
ggml: correct placement of ggml-ext.h
ggml
changes relating to the ggml tensor library for machine learning
#21869
opened Apr 13, 2026 by
ngxson
Contributor
Loading…
2
vulkan: add barrier after writetimestamp
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#21865
opened Apr 13, 2026 by
jeffbolznv
Contributor
Loading…
metal: add GATED_LINEAR_ATTN support
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#21859
opened Apr 13, 2026 by
Ra5hidIslam
Contributor
Loading…
mtmd: add mtmd_image_tokens_get_decoder_pos() API
examples
testing
Everything test related
#21851
opened Apr 13, 2026 by
ngxson
Contributor
Loading…
ggml-cuda: make mul_mat_q tile selection tunable per-type and per-arch
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
testing
Everything test related
#21849
opened Apr 13, 2026 by
aviallon
Contributor
Loading…
sycl : port multi-column MMVQ from CUDA backend (~75% speculative decoding speedup on Intel Arc)
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#21845
opened Apr 13, 2026 by
masonmilby
Loading…
1 task done
cann : add GGML_OP_SET backend support (#21178)
Ascend NPU
issues specific to Ascend NPUs
ggml
changes relating to the ggml tensor library for machine learning
#21841
opened Apr 13, 2026 by
NebulaMao
Loading…
common, ggml : fix non-ASCII file path handling on Windows
ggml
changes relating to the ggml tensor library for machine learning
#21838
opened Apr 13, 2026 by
Anai-Guo
Loading…
ggml-rpc: fix 32-bit ARM (ILP32) serialization bugs
ggml
changes relating to the ggml tensor library for machine learning
#21828
opened Apr 12, 2026 by
rovmo
Loading…
cli : Use acquire/release semantics for stopping logic
examples
#21822
opened Apr 12, 2026 by
matthiasstraka
Loading…
llama : add --hugepages for HugeTLB-backed weight loading (Linux)
#21821
opened Apr 12, 2026 by
doctorjei
Loading…
server : reinit speculative ngram state after context shift to fix GGML_ABORT
examples
server
testing
Everything test related
#21815
opened Apr 12, 2026 by
jonpojonpo
Loading…
server: allow cancel loading model
examples
server
#21814
opened Apr 12, 2026 by
ngxson
Contributor
Loading…
docs/android.md: Add dependency Improvements or additions to documentation
libandroid-spawn for building on termux
documentation
#21812
opened Apr 12, 2026 by
aafsmarak
Loading…
2
TP: fix 0-sized tensor slices, AllReduce fallback
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#21808
opened Apr 12, 2026 by
JohannesGaessler
Contributor
Loading…
llama-bench: fix accumulated load_time in perf timings
examples
#21794
opened Apr 12, 2026 by
abhinavuser
Loading…
server: (anthropic API) fix prefix caching
examples
server
#21793
opened Apr 12, 2026 by
kvc0
Loading…
Previous Next
ProTip!
Follow long discussions with comments:>50.