-
Notifications
You must be signed in to change notification settings - Fork 9.7k
Insights: ggerganov/llama.cpp
Overview
Could not load contribution data
Please try again later
28 Releases published by 1 person
-
b4019
published
Nov 3, 2024 -
b4020
published
Nov 3, 2024 -
b4023
published
Nov 4, 2024 -
b4024
published
Nov 4, 2024 -
b4025
published
Nov 4, 2024 -
b4026
published
Nov 4, 2024 -
b4027
published
Nov 4, 2024 -
b4032
published
Nov 4, 2024 -
b4033
published
Nov 4, 2024 -
b4034
published
Nov 5, 2024 -
b4036
published
Nov 6, 2024 -
b4037
published
Nov 6, 2024 -
b4038
published
Nov 6, 2024 -
b4040
published
Nov 7, 2024 -
b4041
published
Nov 7, 2024 -
b4042
published
Nov 7, 2024 -
b4044
published
Nov 7, 2024 -
b4048
published
Nov 7, 2024 -
b4050
published
Nov 8, 2024 -
b4052
published
Nov 8, 2024 -
b4053
published
Nov 8, 2024 -
b4055
published
Nov 9, 2024 -
b4056
published
Nov 9, 2024 -
b4057
published
Nov 9, 2024 -
b4059
published
Nov 9, 2024 -
b4060
published
Nov 9, 2024 -
b4061
published
Nov 9, 2024 -
b4058
published
Nov 9, 2024
33 Pull requests merged by 14 people
-
metal : reorder write loop in mul mat kernel + style
#10231 merged
Nov 9, 2024 -
metal : fix build and some more comments
#10229 merged
Nov 9, 2024 -
metal : fix F32 accumulation in FA vec kernel
#10232 merged
Nov 9, 2024 -
ggml: fix zero division in ‘dne’ calculation in CUDA COUNT_EQUAL operator when ‘ne’ is small
#10213 merged
Nov 9, 2024 -
ggml : optimize llamafile's cpu matrix multiplication for ppc64le
#10156 merged
Nov 9, 2024 -
scripts: fix pattern and get n_tokens in one go
#10221 merged
Nov 9, 2024 -
metal : opt-in compile flag for BF16
#10218 merged
Nov 8, 2024 -
metal : optimize FA kernels
#10171 merged
Nov 8, 2024 -
swift : exclude ggml-metal-embed.metal
#10211 merged
Nov 8, 2024 -
server : minor UI fix
#10207 merged
Nov 7, 2024 -
server : revamp chat UI with vuejs and daisyui
#10175 merged
Nov 7, 2024 -
ggml : add ggml-cpu.h to the public headers
#10204 merged
Nov 7, 2024 -
Remove identical wte/etw logic for jais
#10203 merged
Nov 7, 2024 -
DRY: Fixes clone functionality
#10192 merged
Nov 7, 2024 -
fix q4_0_8_8 format for corrupted tokens issue
#10198 merged
Nov 7, 2024 -
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration
#10133 merged
Nov 7, 2024 -
metal : add BF16 support
#8439 merged
Nov 6, 2024 -
server : remove hack for extra parallel slot
#10187 merged
Nov 6, 2024 -
metal : fix from ptr buffer name
#10189 merged
Nov 6, 2024 -
ggml : adjust is_first_call init value
#10193 merged
Nov 6, 2024 -
metal : add quantized FA support
#10149 merged
Nov 6, 2024 -
Add the <|tool_call|> formatting to the granite template
#10177 merged
Nov 5, 2024 -
ggml : fix arch check in bf16_to_fp32
#10164 merged
Nov 4, 2024 -
Q6_K AVX improvements
#10118 merged
Nov 4, 2024 -
ggml : fix gelu tables initialization
#10172 merged
Nov 4, 2024 -
ggml : fix q4xx mat mul, increase ggml_aligned_malloc alignment
#10167 merged
Nov 4, 2024 -
server : clarify /slots endpoint, add is_processing
#10162 merged
Nov 4, 2024 -
fix build break on arm64 linux
#10166 merged
Nov 4, 2024 -
cuda : clear error after changing peer access
#10153 merged
Nov 4, 2024 -
[CANN] Fix compile error for CANN backend as get_name has been removed from ggml_backend_buffer_i
#10158 merged
Nov 4, 2024 -
ggml : move CPU backend to a separate file
#10144 merged
Nov 3, 2024 -
metal : minor fixup in FA kernel
#10143 merged
Nov 3, 2024 -
nix: update flake.lock
#10146 merged
Nov 3, 2024
19 Pull requests opened by 14 people
-
gguf-py: Improve `GGUFReader` read-only mode performance
#10159 opened
Nov 4, 2024 -
Introduce New Lookup-Table(LUT)-Based Matrix Multiplication Method
#10181 opened
Nov 5, 2024 -
CUDA: always create events for split buffers
#10185 opened
Nov 5, 2024 -
Introduce IQ4_NL_4_4 format and its neon implementation
#10196 opened
Nov 6, 2024 -
Draft: vulkan: Add VK_NV_cooperative_matrix2 support for mul_mat and FlashAttention2
#10206 opened
Nov 7, 2024 -
docs: add doxygen documentation
#10209 opened
Nov 8, 2024 -
AVX BF16 and single scale quant optimizations
#10212 opened
Nov 8, 2024 -
CANN Support Ascend310P to accelerate F32 and F16 LLM Model
#10216 opened
Nov 8, 2024 -
ci: add Ascend CANN build
#10217 opened
Nov 8, 2024 -
metal : use F16 math in mul_mat kernels
#10220 opened
Nov 8, 2024 -
vulkan: Throttle the number of shader compiles during the build step
#10222 opened
Nov 8, 2024 -
support for llguidance grammars
#10224 opened
Nov 9, 2024 -
vulkan: Fix newly added tests for permuted mul_mat and 1D im2col
#10226 opened
Nov 9, 2024 -
llama : use ggml_backend_dev_get_extra_bufts
#10228 opened
Nov 9, 2024 -
server : enable KV cache defrag by default
#10233 opened
Nov 9, 2024 -
metal : refactor kernel args into structs
#10238 opened
Nov 9, 2024 -
server: Add back samplers
#10239 opened
Nov 9, 2024 -
server : (web UI) add copy button for code block, fix api key
#10242 opened
Nov 9, 2024 -
nix: update flake.lock
#10243 opened
Nov 10, 2024
42 Issues closed by 11 people
-
how to add an extra fixed tensor to the token embedding in gpt2 arch
#9198 closed
Nov 10, 2024 -
Bug: Assertion '__n < this->size()' failed.
#9636 closed
Nov 10, 2024 -
Bug: ggml_metal_init error: zero-length arrays are not permitted in C++ float4x4 lo[D16/NW4];
#10208 closed
Nov 9, 2024 -
Bug: Couldn't get number of tokens from ./llama-cli output!
#10219 closed
Nov 9, 2024 -
Bug: ROCM 7900xtx output random garbage with qwen1.5/14B after recent update
#9568 closed
Nov 9, 2024 -
Feature Request: OpenVINO backend support request
#9601 closed
Nov 9, 2024 -
Do llama.cpp support input_embeds?
#9630 closed
Nov 9, 2024 -
Bug: Metal bfloat kernel crash when using Swift package
#10205 closed
Nov 8, 2024 -
Bug: Can not load llava projector when running llava-cli
#10191 closed
Nov 8, 2024 -
Bug: Name Error when running Llava1.5 examples
#10190 closed
Nov 8, 2024 -
Bug: prompt construction changed in commit 958367bf
#10138 closed
Nov 8, 2024 -
Bug: Gemma 2 slower with FA
#9243 closed
Nov 8, 2024 -
Bug: duplicate vulkan devices being detected on windows
#9516 closed
Nov 8, 2024 -
Add theme Rose Pine
#9584 closed
Nov 7, 2024 -
Feature Request: Support Jina V3 arch
#9585 closed
Nov 7, 2024 -
Bug: passing `tfs_z` crashes the server
#9587 closed
Nov 7, 2024 -
Feature Request: Word Llama
#9600 closed
Nov 7, 2024 -
Bug: `llama-server` web UI resets the text selection during inference on every token update
#9608 closed
Nov 7, 2024 -
Bug: llama.cpp server reports inaccurate n_ctx_per_seq?
#10186 closed
Nov 6, 2024 -
Bug: Llava not working on android
#8436 closed
Nov 6, 2024 -
Bug: Mac build failed using make
#9157 closed
Nov 6, 2024 -
Bug: Templates are swapped for Mistral and Llama 2 in llama-server when using --chat-template
#9583 closed
Nov 6, 2024 -
Bug: Unable to load a 3B model, failing to allocate buffer size
#10188 closed
Nov 5, 2024 -
Feature Request: langchain with_structured_output support
#10168 closed
Nov 5, 2024 -
Bug: "speculative" example is crashing?
#10174 closed
Nov 5, 2024 -
Bug: Unable to enable AVX_VNNI instructions
#10116 closed
Nov 5, 2024 -
Bug: __AVX2__ missing
#10154 closed
Nov 4, 2024 -
Bug: Recent llama.cpp breaks q4_0_4_4 on Arm CPU
#10165 closed
Nov 4, 2024 -
Bug: CUDA error: peer access has not been enabled
#10152 closed
Nov 4, 2024 -
Bug: b3990 ascend cann build error
#10105 closed
Nov 4, 2024 -
Refactor: decide the future of llama_tensor_get_type()
#8736 closed
Nov 4, 2024 -
Feature Request: InternVL2 Support ?
#8848 closed
Nov 4, 2024 -
Feature Request: NPU Support
#9181 closed
Nov 4, 2024 -
Bug: MinGW build fails to load models with "error loading model: PrefetchVirtualMemory unavailable"
#9311 closed
Nov 4, 2024 -
Error compiling using CUDA on Jetson Orin nx
#9533 closed
Nov 4, 2024 -
Bug: Build fails on i386 systems
#9545 closed
Nov 4, 2024 -
Bug: KV quantization fails when using vulkan
#9551 closed
Nov 4, 2024 -
Feature Request: Support GRIN-MoE by Microsoft
#9552 closed
Nov 4, 2024 -
Bug: Unreadable output from android example project
#9555 closed
Nov 4, 2024
22 Issues opened by 19 people
-
Bug: missing tensor blk.0.ffn_down_exps.weight when loading mixtral-8x7b-instruct-v0.1.Q5_K_M.gguf
#10244 opened
Nov 10, 2024 -
fatal error: 'hip/hip_fp16.h' file not found when building using CMake and ROCm 6.2
#10236 opened
Nov 9, 2024 -
Bug: server GET /props request return json with chat_template with last char replaced by \x00
#10235 opened
Nov 9, 2024 -
Bug: CUBLAS_STATUS_INTERNAL_ERROR when using --gpu-layers on ROCm 6.2
#10234 opened
Nov 9, 2024 -
Bug: Server Slows Down Significantly Over Time, Requires Frequent Reboots (RX 7900 XT)
#10227 opened
Nov 9, 2024 -
Bug: image encoding error with malloc memory
#10225 opened
Nov 9, 2024 -
bge-multilingual-gemma2:ERROR:hf-to-gguf:Model Gemma2Model is not supported
#10215 opened
Nov 8, 2024 -
Bug: not support langchain v0.3 to use tools
#10214 opened
Nov 8, 2024 -
Feature Request: Support Airllm
#10202 opened
Nov 7, 2024 -
Bug: DLLAMA_VULKAN=1 tag is not linking vulkan
#10201 opened
Nov 7, 2024 -
Bug: Nondeterministic results on AMD RDNA3 (ROCm) despite zero temperature and fixed seed
#10197 opened
Nov 6, 2024 -
Bug: SYCL crash
#10184 opened
Nov 5, 2024 -
ggml : move LLAMAFILE/tinyBLAS into a backend
#10183 opened
Nov 5, 2024 -
ggml : refactor ggml-cpu.c into multiple C++ source files
#10180 opened
Nov 5, 2024 -
Feature Request: Support BitNet.cpp quantization format
#10179 opened
Nov 5, 2024 -
Bug: Failed to convert `OuteAI/OuteTTS-0.1-350M`
#10178 opened
Nov 5, 2024 -
Bug: Speculative Decoding "Segmentation fault (core dumped)"
#10176 opened
Nov 4, 2024 -
tts : add basic example for text-to-speech
#10173 opened
Nov 4, 2024 -
Bug: CANN E89999
#10161 opened
Nov 4, 2024 -
Feature Request: [CANN] backend supports Ascend 310P
#10160 opened
Nov 4, 2024 -
Bug: GGML_ASSERT(i01 >= 0 && i01 < ne01) failed
#10157 opened
Nov 4, 2024 -
Bug: --log-disable also disables output from the model
#10155 opened
Nov 4, 2024
64 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
add FP8 support to gguf/llama:
#10055 commented on
Nov 8, 2024 • 10 new comments -
sampling: add K-Shift sampler
#10048 commented on
Nov 9, 2024 • 3 new comments -
main : add new feature: special commands
#10145 commented on
Nov 7, 2024 • 2 new comments -
backend cpu: add online flow for aarch64 Q4_0 GEMV/GEMM kernels
#9921 commented on
Nov 8, 2024 • 1 new comment -
Bug: Load time on rpc server with multiple machines
#9820 commented on
Nov 10, 2024 • 0 new comments -
android examples add top_p min_keep to new_context
#9828 commented on
Nov 10, 2024 • 0 new comments -
Feature Request: NEON, SVE2, int8mm optimized kernels for IQ4, K quants ?
#9827 commented on
Nov 10, 2024 • 0 new comments -
Feature Request: RPC offloading using a local model copy
#10095 commented on
Nov 9, 2024 • 0 new comments -
Bug: Certain RPC Servers cause major slowdown to Host machine
#10047 commented on
Nov 9, 2024 • 0 new comments -
Bug: Ccache causing SYCL backend failed to build on Windows
#9954 commented on
Nov 9, 2024 • 0 new comments -
llama : speed-up grammar sampling
#4218 commented on
Nov 9, 2024 • 0 new comments -
Bug: [vulkan] llama.cpp not work on Raspberry Pi 5
#9801 commented on
Nov 9, 2024 • 0 new comments -
llama.cpp Windows/ROCm builds are broken? Using shared GPU memory instead of dedicated.
#9964 commented on
Nov 8, 2024 • 0 new comments -
Bug: Failing to build using cmake on tag b3912
#9913 commented on
Nov 8, 2024 • 0 new comments -
Bug: Model isn't loading
#9563 commented on
Nov 8, 2024 • 0 new comments -
Feature Request: Support for DeciLMForCausalLM
#10028 commented on
Nov 8, 2024 • 0 new comments -
Feature Request: Support for Qwen2-VL
#9246 commented on
Nov 8, 2024 • 0 new comments -
Bug: No improvement for NEON?
#9774 commented on
Nov 8, 2024 • 0 new comments -
Optimization of matrix-vector kernel memory accesses for NVIDIA CUDA High Bandwidth GPUs
#9817 commented on
Nov 10, 2024 • 0 new comments -
Bug: Failed to process regex error with long repeating sequences
#9715 commented on
Nov 10, 2024 • 0 new comments -
[CANN]Bug: Can't compile ggml/src/CMakeFiles/ggml.dir/ggml-cann/acl_tensor.cpp.o
#9560 commented on
Nov 10, 2024 • 0 new comments -
Bug: Slow model loading with mmap
#9244 commented on
Nov 10, 2024 • 0 new comments -
Feature Request: Support llava with different vision/LM backbones
#8574 commented on
Nov 10, 2024 • 0 new comments -
Llama cpp low level python bindings
#1660 commented on
Nov 5, 2024 • 0 new comments -
llama : initial Mamba-2 support
#9126 commented on
Nov 4, 2024 • 0 new comments -
llama: (proposal) propagating the results of `graph_compute` to the user interface
#9525 commented on
Nov 10, 2024 • 0 new comments -
[gguf-py] gguf_reader: numpy 2 newbyteorder fix
#9772 commented on
Nov 5, 2024 • 0 new comments -
fix gguf-py: Conversion error when multiple licenses are configured
#9807 commented on
Nov 9, 2024 • 0 new comments -
[SYCL] Fix build on Windows when ccache enabled (#9954)
#9976 commented on
Nov 9, 2024 • 0 new comments -
[SYCL] pass SYCL CI
#10041 commented on
Nov 8, 2024 • 0 new comments -
metal : GPU "idle-throttling" analysis
#10119 commented on
Nov 3, 2024 • 0 new comments -
Fix docker locale issue (#6267)
#10142 commented on
Nov 4, 2024 • 0 new comments -
Bug: using kv cache quantitisation q4_0 seems to cause issues when a context shift is done
#9743 commented on
Nov 4, 2024 • 0 new comments -
Feature Request: [metal] implement FA kernels for quantized KV cache
#9736 commented on
Nov 4, 2024 • 0 new comments -
Bug: ggml_vulkan can only Found 1 Vulkan devices.
#9716 commented on
Nov 4, 2024 • 0 new comments -
Feature Request: Support Codestral Mamba
#8519 commented on
Nov 4, 2024 • 0 new comments -
llama_kv_cache_seq_shift does not work with cache type q4_0
#5652 commented on
Nov 4, 2024 • 0 new comments -
Bug: gguf pypi package corrupts environment
#9566 commented on
Nov 4, 2024 • 0 new comments -
llama : store token ids in the KV Cache
#9113 commented on
Nov 4, 2024 • 0 new comments -
Feature Request: Support Aya
#10035 commented on
Nov 4, 2024 • 0 new comments -
Bug: gguf tries to access newbyteorder, which was removed in numpy2.0
#10127 commented on
Nov 4, 2024 • 0 new comments -
Problem with using llava_surgery_v2.py
#9750 commented on
Nov 5, 2024 • 0 new comments -
Feature Request: Anti-slop / fine tuning of a model output in realtime / on the fly for output quality enhancement.
#9748 commented on
Nov 5, 2024 • 0 new comments -
Bug: struct llama_file has two different definitions (breaks ODR)
#9770 commented on
Nov 5, 2024 • 0 new comments -
llama : tool for evaluating quantization results per layer
#2783 commented on
Nov 5, 2024 • 0 new comments -
llama : support Mamba-2
#7727 commented on
Nov 5, 2024 • 0 new comments -
metal : compile-time kernel args and params
#4085 commented on
Nov 5, 2024 • 0 new comments -
ci : add Apple silicon (M1) macOS runners
#3469 commented on
Nov 5, 2024 • 0 new comments -
ggml : unified CMake build
#6913 commented on
Nov 5, 2024 • 0 new comments -
How can i get log probs in create_chat_completions in llama-cpp , I'm using logprobs=True as an attribute but still not getting Log Probabilities.
#6423 commented on
Nov 5, 2024 • 0 new comments -
Bug: No text response when "--log-disable" is set
#10002 commented on
Nov 5, 2024 • 0 new comments -
llama_model_load: error loading model: vk::PhysicalDevice::createDevice: ErrorDeviceLost
#9767 commented on
Nov 6, 2024 • 0 new comments -
Bug: Rocm extreme slow down on GFX1100 with release binary
#9765 commented on
Nov 6, 2024 • 0 new comments -
Feature Request: multimodal on android
#9738 commented on
Nov 6, 2024 • 0 new comments -
Bug: llama-quantize --help is not printed
#10122 commented on
Nov 6, 2024 • 0 new comments -
Bug: LLAMA_MAX_LAYERS must be increased to run FatLlama 1.7T
#9909 commented on
Nov 6, 2024 • 0 new comments -
Bug: Cannot edit input before the current line.
#9777 commented on
Nov 7, 2024 • 0 new comments -
Feature Request: ANE utilization on Apple Silicon
#9773 commented on
Nov 7, 2024 • 0 new comments -
Potential GPU Usage During CPU Inference (ngl=0)
#9724 commented on
Nov 7, 2024 • 0 new comments -
changelog : `llama-server` REST API
#9291 commented on
Nov 7, 2024 • 0 new comments -
Typo on build.md?
#9793 commented on
Nov 8, 2024 • 0 new comments -
Bug: After update, unable to load GGUF models
#9790 commented on
Nov 8, 2024 • 0 new comments -
Feature Request: Enable overallocation for ggml-vulkan
#9785 commented on
Nov 8, 2024 • 0 new comments -
Feature Request: Support for architecture MambaByte
#9780 commented on
Nov 8, 2024 • 0 new comments