Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda refactor, multi GPU support #1607

Closed

Conversation

JohannesGaessler
Copy link
Collaborator

@JohannesGaessler JohannesGaessler commented May 27, 2023

This PR is quite large. Its primary goal is to lay the groundwork for the implementation of further CUDA kernels for ggml operations. I am also adding multi GPU support because it's easier to integrate now than it would be at a later point.

For Users

Build instructions (Linux):

git clone https://github.com/JohannesGaessler/llama.cpp llama.cpp-johannesgaessler
cd llama.cpp-johannesgaessler                               
git fetch
git switch cuda-refactor-8
make LLAMA_CUBLAS=1

When compiling with LLAMA_CUBLAS=1 the program automatically detects the available NVIDIA devices and splits weights proportional to VRAM. There is not yet a CLI argument for setting the tensor splits. The performance increase on my test systems is relatively low (+70% t/s when going from 1x GTX TITAN X to 4x GTX TITAN X). It's possible that there is still a bug that hampers performance. Please do tell me how well (if at all) it works for you. In any case, this PR should already allow you to pool the VRAM of multiple GPUs to load larger models.

For Developers

This PR is still very much WIP. I will do a refactor to remove artifacts from bad/obsolete design decisions. You can already review the code if you want but many of the flaws are still subject to change. Should be good now.

On master there are separate functions for invoking CUDA kernels. Apart from invoking the actual CUDA kernels they do other things such as copying data between host and device. This PR adds a template ggml_cuda_op that manages

  1. the transfer of data between host and device,
  2. the dequantization of src0 (needed for cuBLAS matrix multiplication),
  3. the broadcasting of src1 across src0 (needed for multiplication),
  4. and multi GPU things.

The actual operations now only need to define how the data should be manipulated.

This PR also moves the entry point for invoking CUDA kernels from the ggml function such as ggml_compute_forward_mul_mat_q_f32 and instead adds a function ggml_cuda_compute_forward that is called from ggml_compute_forward. For this to work I moved ggml_task_type and ggml_compute_params from ggml.c to ggml.h.

This PR adds an int for the layer, an int for the device id, and dedicated device data pointers to ggml_tensor. I need these for bookkeeping. I also changed the backends from GGML_BACKEND_CUDA and GGML_BACKEND_OPENCL to GGML_BACKEND_GPU (tensor data on 1 GPU) and GGML_BACKEND_GPU_SPLIT (tensor data split across all GPUs). Since I think that we don't want to support the simultaneous use of CUDA and OpenCL it's simpler to just use the same backend types for both implementations and to differentiate via defines.

@JohannesGaessler JohannesGaessler added the performance Speed related topics label May 27, 2023
@JohannesGaessler JohannesGaessler marked this pull request as draft May 27, 2023 09:29
@JohannesGaessler JohannesGaessler added enhancement New feature or request hardware Hardware related refactoring Refactoring labels May 27, 2023
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

ggml-opencl.cpp Outdated Show resolved Hide resolved
ggml-opencl.cpp Outdated Show resolved Hide resolved
llama.cpp Outdated Show resolved Hide resolved
@ENjoyBlue2021
Copy link

ENjoyBlue2021 commented May 27, 2023

Thanks so much for adding multi GPU support, I was looking forward to it. You are the king man.
This is extremly useful to increase my vram for 30b.
I'm posting my stats with my 1080ti+1080.
Amazing stuff, its noticeable faster.

Stats with this version:

./main -m '/media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin' -n 128 --n-gpu-layers 39 --threads 6 --no-mmap -s 1685192470
WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible.
main: build = 596 (428c342)
main: seed  = 1685192470
ggml_init_cublas: found 2 CUDA devices:
  1. NVIDIA GeForce GTX 1080 Ti
  2. NVIDIA GeForce GTX 1080
llama.cpp: loading model from /media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 23269.21 MB
llama_model_load_internal: mem required  = 10646.42 MB (+ 3124.00 MB per state)
llama_model_load_internal: [cublas] offloading 39 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 14926 MB
....................................................................................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 6 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 // Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

namespace MSBuild.ExtensionPack.Platform
{
    /// <summary>
    /// Adds support for a target to find the first non-empty environment variable, by name
llama_print_timings:        load time = 14079.62 ms
llama_print_timings:      sample time =    55.40 ms /   128 runs   (    0.43 ms per token)
llama_print_timings: prompt eval time =   722.20 ms /     2 tokens (  361.10 ms per token)
llama_print_timings:        eval time = 54367.11 ms /   127 runs   (  428.09 ms per token)
llama_print_timings:       total time = 68522.85 ms

Compare old Version:

./main -m '/media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin' -n 128 --n-gpu-layers 17 --threads 6 --no-mmap -s 1685192470
WARNING: when using cuBLAS generation results are NOT guaranteed to be reproducible.
main: build = 583 (7e4ea5b)
main: seed  = 1685192470
llama.cpp: loading model from /media/w/PhoenixSSD/oobabooga/text-generation-webui/models/supercot30b-ggml/ggml-model-q5_1.bin
llama_model_load_internal: format     = ggjt v2 (pre #1508)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 512
llama_model_load_internal: n_embd     = 6656
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 52
llama_model_load_internal: n_layer    = 60
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: n_ff       = 17920
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 30B
llama_model_load_internal: ggml ctx size = 23269.14 MB
llama_model_load_internal: mem required  = 19066.59 MB (+ 3124.00 MB per state)
llama_model_load_internal: [cublas] offloading 17 layers to GPU
llama_model_load_internal: [cublas] total VRAM used: 6506 MB
....................................................................................................
llama_init_from_file: kv self size  =  780.00 MB

system_info: n_threads = 6 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0


 // Copyright (c) Microsoft. All rights reserved.
// Licensed under the MIT license. See LICENSE file in the project root for full license information.

using System;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Threading;
using Microsoft.Build.Framework;
using Microsoft.Build.Utilities;

namespace MSBuild.ExtensionPack.Platform
{
    /// <summary>
    /// Adds support for a target to find the first non-empty environment variable, by name
llama_print_timings:        load time = 11117.69 ms
llama_print_timings:      sample time =    55.35 ms /   128 runs   (    0.43 ms per token)
llama_print_timings: prompt eval time =   896.98 ms /     2 tokens (  448.49 ms per token)
llama_print_timings:        eval time = 82108.70 ms /   127 runs   (  646.53 ms per token)
llama_print_timings:       total time = 93302.20 ms

@SlyEcho
Copy link
Sponsor Collaborator

SlyEcho commented May 27, 2023

Would be too much to ask cross platform (Nvidia + AMD) support LMAO.

I will try to check later, I think I can access a machine with 2x 2080 Ti

@JohannesGaessler
Copy link
Collaborator Author

JohannesGaessler commented May 27, 2023

Performance numbers from my test machine with an i5-4570S, 16 GB of RAM @ 1600 MHz, and a GTX 1070 + a GTX 1050 ti:

Model GPU ms/t t/s
7b q4_0 GTX 1070 71.37 14.01
7b q4_0 GTX 1070 + GTX 1050 ti 68.66 14.56
13b q4_0 GTX 1070 134.19 7.45
13b q4_0 GTX 1070 + GTX 1050 ti 128.13 7.80
33b q4_0 GTX 1070 Unusable Unusable
33b q4_0 GTX 1070 + GTX 1050 ti 575.12 1.74

Numbers for single GPU are obtained using the master branch.

Note: previously I was able to run 33b q4_0 with just the GTX 1070 on master; there may be something on master that has increased RAM usage since.

@SlyEcho
Copy link
Sponsor Collaborator

SlyEcho commented May 27, 2023

33b

You mean 30B? I can run 30B Q4_0 with my 8 GB card with 20 layers loaded only.


auto & layer = model.layers[i];

std::string layers_i = "layers." + std::to_string(i);

layer.attention_norm = ml->get_tensor(layers_i + ".attention_norm.weight", {n_embd}, backend);
layer.attention_norm = ml->get_tensor(layers_i + ".attention_norm.weight", {n_embd}, i, backend);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can decide which card we should put the layer on in this function (backend + device_id).

We should make backend like CPU = 0xFFFFFFFF; CUDA=0xEEEEEE00; (256 cards supported); etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR implements two ways to utilize multiple GPUs: tensors can either be put on a single GPU or they can be split across multiple GPUs. I think that tensors that require little computation should be put on a single GPU to reduce data copying. In the function ggml_cuda_load_data single GPU tensors are distributed to GPUs based on which layer they are in. I guess this distribution could also be done here but I don't think it would make a significant difference.

@JohannesGaessler
Copy link
Collaborator Author

You mean 30B? I can run 30B Q4_0 with my 8 GB card with 20 layers loaded only.

"30B" seems to be a typo by Meta that has become dominant. In the paper they talk about a "33B" model so that is the term that I'm using.

ggml-cuda.cu Show resolved Hide resolved
ggml.h Show resolved Hide resolved
ggml-cuda.cu Outdated Show resolved Hide resolved
ggml-cuda.cu Outdated Show resolved Hide resolved
}

struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, ggml_backend backend) {
struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, int layer, ggml_backend backend) {
Copy link
Sponsor Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it were like this so the tensors that are not on a layer don't have to specify -1?

Suggested change
struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, int layer, ggml_backend backend) {
struct ggml_tensor * get_tensor_for(llama_load_tensor & lt, ggml_backend backend, int layer = -1) {

ggml-cuda.cu Outdated Show resolved Hide resolved
ggml.h Outdated Show resolved Hide resolved
SlyEcho added a commit to SlyEcho/llama.cpp that referenced this pull request May 27, 2023
For forward compatibility ggerganov#1607
@slaren
Copy link
Collaborator

slaren commented May 27, 2023

I think ggml_cuda_mul and ggml_cuda_mul_mat can be removed from ggml-cuda.h now and made static.

@JohannesGaessler
Copy link
Collaborator Author

JohannesGaessler commented May 27, 2023

I added a comment to explain the weird device to host memcpy for split tensors. Since I as the person who wrote the code won't know: are there other parts of the code that are unintuitive or difficult to understand?

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

llama.cpp Show resolved Hide resolved
@JohannesGaessler
Copy link
Collaborator Author

I added a CLI argument that lets the user set the tensor split. On my system a less VRAM efficient split of 3:1 seems to do better than 2:1 because it's more efficient in terms of compute:

Model GPU ms/t t/s
7b q4_0 GTX 1070 71.37 14.01
7b q4_0 GTX 1070 + GTX 1050 ti, 2:1 split 68.66 14.56
7b q4_0 GTX 1070 + GTX 1050 ti, 3:1 split 59.03 16.94
13b q4_0 GTX 1070 134.19 7.45
13b q4_0 GTX 1070 + GTX 1050 ti, 2:1 split 128.13 7.80
13b q4_0 GTX 1070 + GTX 1050 ti, 3:1 split 109.14 9.15
33b q4_0 GTX 1070 Unusable Unusable
33b q4_0 GTX 1070 + GTX 1050 ti, 2:1 split 575.12 1.74
33b q4_0 GTX 1070 + GTX 1050 ti, 3:1 split 571.10 1.75

@KerfuffleV2
Copy link
Collaborator

-n N, --n-predict N   number of tokens to predict (default: -1, -1 = infinity)

Please create an issue or discussion topic in the Q&A section when you have general questions instead of asking in pull requests.

Comments in pull requests are primarily for discussing that specific pull.

@JohannesGaessler
Copy link
Collaborator Author

Both CUDA and OpenCL seem to be working correctly in short tests. However, right now I'm too tired to test this PR in detail. I'll do it tomorrow morning.

@JohannesGaessler
Copy link
Collaborator Author

As far as I can tell everything is working correctly.

@KerfuffleV2
Copy link
Collaborator

KerfuffleV2 commented Jun 5, 2023

ggml_init_cublas: found 1 CUDA devices:
  1. NVIDIA GeForce GTX 1060 6GB

Works for me on Linux, performance for generating tokens with main seems the same as master. Identical results with the same seed also. Perplexity (and I'd assume just cuBLAS prompt evaluation in general) is noticeably slower though. (Output abbreviated below.)

master

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 0
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 3.66 seconds per pass - ETA 39 minutes
[1]4.4545,[2]4.9402,[3]5.8278

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 33
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 3.37 seconds per pass - ETA 36 minutes
[1]4.4547,[2]4.9404

this pull

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 0
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 5.47 seconds per pass - ETA 59 minutes
[1]4.4545,[2]4.9402

$ taskset -c 0-5 ./perplexity -f /path/wiki.test.raw -m /path/llama-7b.ggmlv3.q4_0.bin -t 6 -ngl 33
system_info: n_threads = 6 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | 
perplexity: calculating perplexity over 655 chunks, batch_size=512
perplexity: 5.12 seconds per pass - ETA 55 minutes
[1]4.4545,[2]4.9402

@JohannesGaessler
Copy link
Collaborator Author

Thank you for pointing this out, I forgot to check it. The reason is most likely that on master there are some f16 f32 matrix multiplications where f32 is converted to f16 and then cuBLAS is used to do f16 matrix multiplication. In this PR however the f16 data is converted to f32 and then cuBLAS is used to do f32 matrix multiplication. Originally I did not want to touch the f16 f32 code at all but it caused issues for CPU layers in combination with >1 GPU.

As a user I don't really care about prompt processing speed because it's so fast anyways. As a developer waiting longer for perplexity calculations could be an issue though. I would be fine with delaying the merging of this PR if it's desired to fix prompt processing speed beforehand; otherwise I will at a later point make another PR to fix it.

@JohannesGaessler
Copy link
Collaborator Author

I forgot to mention: the f16 f32 matrix multiplications are mostly related to the KV cache and they're done on permuted matrices. To use regular cuBLAS matrix multiplication you first need to copy the matrices to make them contiguous but I'm currently trying to develop a kernel that operates directly on the permuted memory layout. Ideally that will then be faster than master anyways (for GPU layers).

@KerfuffleV2
Copy link
Collaborator

As a user I don't really care about prompt processing speed because it's so fast anyways. As a developer waiting longer for perplexity calculations could be an issue though.

This is just the opinion of a random person, but prompt processing is fast compared to inference for sure but at least on relatively old hardware something like an 800 token prompt still takes a pretty noticeable time to evaluate especially on larger models like 33b+. Roughly doubling that time from, I don't know, 15-20sec to 30-40sec is quite noticeable.

Also, with the advances to context size technology and some models being fine tuned to allow >2048 sized context it's likely in the next months or whatever longer prompts are going to get more common.

I've also been doing experimentation on various stuff and using the perplexity tool to check the effect so making it twice as slow (and it's already pretty low for big models) definitely will make me sad.

@JohannesGaessler
Copy link
Collaborator Author

One workaround that might work would be to re-add the implementation on master and to only enable it when there is a single GPU.

@JohannesGaessler
Copy link
Collaborator Author

I pushed a fix that re-adds the old code for single GPU prompt processing. Prompt processing seems to still be slightly slower but only by ~20%. I'm not sure what's causing the remaining difference; investigating and optimizing the performance would probably take some time. In any case, @KerfuffleV2 , can you please check the performance on your system?

@KerfuffleV2
Copy link
Collaborator

Definitely helps.

-ngl 0

master: 3.66 seconds per pass - ETA 39 minutes
before: 5.47 seconds per pass - ETA 59 minutes
now: 4.16 seconds per pass - ETA 45 minutes

-ngl 33

master: 3.37 seconds per pass - ETA 36 minutes
before: 5.12 seconds per pass - ETA 55 minutes
now: 3.74 seconds per pass - ETA 40 minutes

@JohannesGaessler
Copy link
Collaborator Author

Alright, my current status is this: I'm working on another PR based on this one that adds GPU acceleration for the entire LLaMa model rather than just matrix multiplication and component-wise multiplication. On my hardware this WIP version is already faster than master and this PR when it comes to token generation. I would like to get a working version for that first, and then optimize performance afterwards. Optimizing the performance of this PR now seems like it would be a bad investment of my time since I think I will be able to get better performance with a fundamentally different implementation anyways. So I think the two options for this PR are to either merge it and accept a performance regression for prompt processing or to wait until I have the next PR ready which I think will be faster than master across the board.

@KerfuffleV2
Copy link
Collaborator

Sounds reasonable.

I'm working on another PR based on this one that adds GPU acceleration for the entire LLaMa model rather than just matrix multiplication and component-wise multiplication.

Feel free to ignore this if it's not the appropriate time/place to ask, but I'm curious if this is going to involve requiring significantly more VRAM?

@JohannesGaessler
Copy link
Collaborator Author

llama.cpp master currently reserves 1 GB of scratch buffer memory for 33b and smaller models. Translating this 1:1 to VRAM would be simple but inefficient because these scratch buffers are much larger than they need to be. Also you only need that much memory initially during prompt processing and there the amount required is proportional to the batch size. However, for low-end cards a large batch size probably has diminishing returns anyways. You also need some VRAM for the KV cache but I haven't worked out how much that will be exactly.

My current testing targets are an RTX 3090, a GTX 1070, and a GTX 1050 ti. If I can get universally better performance for all of these that's great, otherwise I plan to add something like a --low-vram option.

llama.cpp Show resolved Hide resolved
@ggerganov
Copy link
Owner

So I think the two options for this PR are to either merge it and accept a performance regression for prompt processing or to wait until I have the next PR ready which I think will be faster than master across the board.

The perplexity computation speed is quite important to me so if the fix is not trivial, I prefer to wait for the other PR that will hopefully not degrade the performance. The multi-GPU addition is really great, but from practical point of view, we need the perplexity speed right now, even more so because of #1684 efforts that are about to get merged.

I'll leave this PR open for now - if I get the time, I might try to fix the regression and merge it (if the other PR is not ready yet)

@JohannesGaessler
Copy link
Collaborator Author

JohannesGaessler commented Jun 5, 2023

I did a quick test with my WIP PR. The perplexity speed is 2.23 ms/t vs. 3.07 ms/t on master (numbers are for 7b q4_0). The PR has GPU acceleration for add, SiLU, RMS norm, cpy, reshape, view, transpose, and RoPE. As a third option I could prioritize getting that PR in a state that can be merged and then deliver kernels for the rest of the operations in a later PR.

Edit: for 33b q4_0 the speed of the WIP PR is 6.50 ms/t vs. 9.35 ms/t on master.

@ggerganov
Copy link
Owner

As a third option I could prioritize getting that PR in a state that can be merged and then deliver kernels for the rest of the operations in a later PR.

Up to you - whichever way you prefer

@JohannesGaessler
Copy link
Collaborator Author

I'll do it.

@JohannesGaessler
Copy link
Collaborator Author

Obsolete due to the merging of #1703

YellowRoseCx added a commit to YellowRoseCx/koboldcpp-rocm that referenced this pull request Aug 25, 2023
commit 3416c98
Merge: 5eb17f0 4c4e435
Author: YellowRoseCx <[email protected]>
Date:   Fri Aug 25 13:46:56 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 5eb17f0
Author: YellowRoseCx <[email protected]>
Date:   Fri Aug 25 13:38:21 2023 -0500

    ROCm Port update

    * use hipblas based on cublas
    * Update Makefile for the Cuda kernels
    * Expand arch list and make it overrideable
    * Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)
    * add hipBLAS to README
    * new build arg LLAMA_CUDA_MMQ_Y
    * fix half2 decomposition
    * Add intrinsics polyfills for AMD
    * AMD assembly optimized __dp4a
    * Allow overriding CC_TURING
    * use "ROCm" instead of "CUDA"
    * ignore all build dirs
    * Add Dockerfiles
    * fix llama-bench
    * fix -nommq help for non CUDA/HIP

    ---------

    Co-Authored-By: YellowRoseCx <[email protected]>
    Co-Authored-By: ardfork <[email protected]>
    Co-Authored-By: funnbot <[email protected]>
    Co-Authored-By: Engininja2 <[email protected]>
    Co-Authored-By: Kerfuffle <[email protected]>
    Co-Authored-By: jammm <[email protected]>
    Co-Authored-By: jdecourval <[email protected]>

commit b34f4bd
Author: YellowRoseCx <[email protected]>
Date:   Sat Aug 19 17:12:52 2023 -0500

    Update README.md

commit 7d11961
Author: YellowRoseCx <[email protected]>
Date:   Mon Aug 14 23:03:12 2023 -0500

    remove force DMMV

commit cd61aa0
Author: YellowRoseCx <[email protected]>
Date:   Sat Aug 12 17:24:31 2023 -0500

    restore main_gpu parameter

commit 4a042f3
Author: Henri Vasserman <[email protected]>
Date:   Sat Aug 12 10:51:46 2023 +0300

    gfx1100 support

    ---------

    Co-authored-by: ardfork <[email protected]>
    Co-authored-by: jammm <[email protected]>
    Co-authored-by: jdecourval <[email protected]>

commit 8913bc6
Author: Henri Vasserman <[email protected]>
Date:   Fri Aug 11 10:16:02 2023 +0300

    Allow overriding CC_TURING

commit e77a4c3
Author: Henri Vasserman <[email protected]>
Date:   Fri Aug 11 10:00:07 2023 +0300

    Merge 'origin/master' into hipblas

commit cc4c4e3
Author: Engininja2 <[email protected]>
Date:   Fri Aug 11 09:43:14 2023 +0300

    New __dp4a assembly

    Now compatible with gfx900 and faster as well.

commit 1a03b70
Author: Henri Vasserman <[email protected]>
Date:   Fri Aug 11 09:30:28 2023 +0300

    Undo mess

    ---------

    Co-authored-by: ardfork <[email protected]>

commit 4366ff9
Author: DannyDaemonic <[email protected]>
Date:   Thu Aug 10 13:11:36 2023 -0700

    Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows.

commit 811ff85
Author: Christian Demsar <[email protected]>
Date:   Thu Aug 10 10:28:27 2023 -0400

    Add --n-predict -2 for stopping generation on full context (ggerganov#2565)

commit 37c9717
Author: Martin Krasser <[email protected]>
Date:   Thu Aug 10 12:16:38 2023 +0200

    Fix grammar-based sampling issue in server (ggerganov#2566)

commit d18ecd5
Author: YellowRoseCx <[email protected]>
Date:   Thu Aug 10 13:19:41 2023 -0500

    make mmq gen faster for amd

commit 243894a
Author: Henri Vasserman <[email protected]>
Date:   Thu Aug 10 12:14:40 2023 +0300

    ws fix

commit ac2f14d
Author: Engininja2 <[email protected]>
Date:   Thu Aug 10 12:11:27 2023 +0300

    AMD assembly optimized __dp4a

    Doesn't seem to work for gfx900, so commented out.

commit 9dba0c9
Author: Henri Vasserman <[email protected]>
Date:   Thu Aug 10 12:09:28 2023 +0300

    Fix merge

    ---------

    Co-authored-by: ardfork <[email protected]>
    Co-authored-by: Kerfuffle <[email protected]>

commit f570b5c
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 22:11:20 2023 -0500

    Revert "revert cuda changes as they are bugggy"

    This reverts commit 1541bf8.

commit 1541bf8
Author: Concedo <[email protected]>
Date:   Wed Aug 9 22:36:41 2023 +0800

    revert cuda changes as they are bugggy

commit bacc202
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 20:37:17 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit b7cb4cf
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 20:00:52 2023 -0500

    additional fixes

commit fadae72
Merge: 518eb2a 8f8ab6c
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:45:50 2023 -0500

    Merge branch 'hipblas' into develop4Main

commit 518eb2a
Merge: bda0215 cae6a84
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:32:10 2023 -0500

    Merge remote-tracking branch 'upstream/concedo' into develop2Main

commit bda0215
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:17:54 2023 -0500

    update makefile to multisystem path

commit 8f8ab6c
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:05:03 2023 -0500

    hipLDFLAG Path change Unix to multisystem in Makefile

    changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS

commit 610ba4c
Merge: 4024f91 25d43e0
Author: Henri Vasserman <[email protected]>
Date:   Wed Aug 9 23:54:58 2023 +0300

    Merge 'origin/master' into hipblas

commit 4024f91
Author: Henri Vasserman <[email protected]>
Date:   Wed Aug 9 01:56:44 2023 +0300

    Add intrinsics polyfills for AMD

    ---------

    Co-authored-by: ardfork <[email protected]>
    Co-authored-by: funnbot <[email protected]>
    Co-authored-by: Engininja2 <[email protected]>

commit ab62128
Merge: d91456a f5bfea0
Author: Henri Vasserman <[email protected]>
Date:   Wed Aug 9 00:37:01 2023 +0300

    Merge 'origin/master' into hipblas

commit ee9fa2a
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 2 01:53:58 2023 -0500

    Update Makefile

commit d91456a
Author: ardfork <[email protected]>
Date:   Mon Jul 31 20:35:00 2023 +0300

    fix half2 decomposition

commit c1cb70d
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 31 19:56:44 2023 +0300

    new build arg LLAMA_CUDA_MMQ_Y

commit c1664a0
Merge: 4336231 0728c5a
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 31 19:32:27 2023 +0300

    Merge 'origin/master' into hipblas

commit 848558d
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 30 20:02:52 2023 -0500

    import vars logic fix

commit b650b84
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 30 00:21:36 2023 -0500

    Update easy_KCPP-ROCm_install.sh

commit 8573a67
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 21:31:12 2023 -0500

    remove duplicate code and fix typo

    remove duplicate tooltip

commit 430986e
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 21:07:34 2023 -0500

    hide "missing" if all are built

    move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
    " if len(runopts)==6 else + "

commit dd0db72
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 20:52:31 2023 -0500

    hide "missing" if all are built

    move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available

commit 43fffb6
Merge: 0ed65a4 b40550c
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 19:13:15 2023 -0500

    Merge branch 'concedo'

commit 0ed65a4
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 18:34:21 2023 -0500

    Hide unavailable backends & Add tooltip over backend count

    Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

    Add tooltip when hovering over backend count label

    hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built

commit 2a26398
Merge: cee2e9d 31486eb
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 15:16:33 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 4336231
Author: Henri Vasserman <[email protected]>
Date:   Sat Jul 29 18:35:56 2023 +0300

    add hipBLAS to README

    ---------

    Co-authored-by: ardfork <[email protected]>

commit f8e3fc6
Author: Henri Vasserman <[email protected]>
Date:   Sat Jul 29 14:16:46 2023 +0300

    rocblas init stuff

commit d2ade63
Merge: cde52d6 8a88e58
Author: Henri Vasserman <[email protected]>
Date:   Sat Jul 29 12:59:48 2023 +0300

    Merge 'origin/master' into hipblas

commit cee2e9d
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 26 23:36:55 2023 -0500

    Only Show Available Backends in GUI

    Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

commit 7863610
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 26 13:27:22 2023 -0500

    Update easy_KCPP-ROCm_install.sh

commit 731cd6e
Author: YellowRoseCx <[email protected]>
Date:   Tue Jul 25 22:39:50 2023 -0500

    Create easy_rocm_install.sh

commit f154685
Merge: cbdc1f3 94e0a06
Author: YellowRoseCx <[email protected]>
Date:   Tue Jul 25 22:25:10 2023 -0500

    Merge branch 'concedo_experimentalMAIN'

commit cbdc1f3
Merge: 5b838d4 9731682
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 16:53:21 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit cde52d6
Merge: 8e8054a 84e09a7
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 24 12:22:58 2023 +0300

    Merge 'origin/master' into hipblas

commit 8e8054a
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 24 12:20:49 2023 +0300

    Add rocblas to build files

commit 1f6294d
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:52:01 2023 -0500

    Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)

    * initialize rocblas

commit 5b838d4
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:10:35 2023 -0500

    amd multigpu full layer offload w/o vram scratch

commit 9bfb2fd
Merge: b379f9d 66328fc
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:07:44 2023 -0500

    Merge branch 'concedo_experimental'

commit b379f9d
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:07:00 2023 -0500

    Revert "amd multigpu full layer offload w/o vram scratch"

    This reverts commit 9adfc8e.

commit 9adfc8e
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 02:56:40 2023 -0500

    amd multigpu full layer offload w/o vram scratch

commit 05c792e
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 00:18:48 2023 -0500

    initialize rocblas

commit ade68d0
Merge: 521ad6b 56995ca
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 23 20:25:05 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 521ad6b
Author: YellowRoseCx <[email protected]>
Date:   Thu Jul 20 21:42:33 2023 -0500

    lazy import_var error handling for saves

commit 9553e52
Merge: cac6650 f036109
Author: YellowRoseCx <[email protected]>
Date:   Thu Jul 20 19:59:41 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit cac6650
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 17 23:05:02 2023 -0500

    Makefile fix! Allows hip/clblast build together

commit 3db70b5
Merge: 2ec4466 7568d1a
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 18 01:54:17 2023 +0300

    Merge 'origin/master' into hipblas

commit f208670
Author: YellowRoseCx <[email protected]>
Date:   Fri Jul 14 02:56:03 2023 -0500

    improve error handling with gpu names

commit 860e738
Author: YellowRoseCx <[email protected]>
Date:   Fri Jul 14 00:33:03 2023 -0500

    Show GPU names in GUI, Only show GPUs that exist

    changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes.

commit 2ec4466
Author: Henri Vasserman <[email protected]>
Date:   Thu Jul 13 13:44:02 2023 +0300

    Update build flags.

    GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y
    so update your build instructions.

    GGML_CUDA_FORCE_DMMV is always enabled.

    ---------

    Co-authored-by: YellowRoseCx <[email protected]>

commit cd36b18
Merge: afcb8fe 1cbf561
Author: Henri Vasserman <[email protected]>
Date:   Thu Jul 13 13:03:01 2023 +0300

    Merge 'origin/master' into hipblas

commit ac7ebc3
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 18:32:18 2023 -0500

    add hipBLAS name scheme to GUI and update README

commit 7f85cc5
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 17:35:54 2023 -0500

    update makefile and ggml.c

commit 6ca3499
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 15:43:45 2023 -0500

    ggml.c fix

commit 770e674
Merge: 2b289cd 5941514
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 15:24:36 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 2b289cd
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 14:30:00 2023 -0500

    Update c-cpp.yml

commit 5dae95a
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 14:28:51 2023 -0500

    Update c-cpp.yml

commit b37cd73
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 14:27:04 2023 -0500

    Create c-cpp.yml to test Actions

commit afcb8fe
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 11 18:09:27 2023 +0300

    Add new config option

commit 8c2c497
Merge: e610466 2347463
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 11 17:53:54 2023 +0300

    Merge 'origin/master' into hipblas

commit e610466
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 11 17:53:14 2023 +0300

    Expand arch list and make it overrideable

commit 80e4e54
Merge: 7735c5a 1d16309
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 10 02:09:28 2023 +0300

    Merge 'origin/master' into hipblas

commit 8432e9d
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 9 16:55:30 2023 -0500

    Update Makefile

commit b58c189
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 9 16:20:00 2023 -0500

    Add multi-gpu CuBLAS support to new GUI

commit 0c1c71b
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 8 07:56:57 2023 -0500

    Update Makefile

commit f864f60
Author: Johannes Gäßler <[email protected]>
Date:   Sat Jul 8 00:25:15 2023 +0200

    CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140)

commit 4539bc2
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 8 01:36:14 2023 -0500

    update makefile for changes

commit 912e31e
Merge: 74e2703 ddaa4f2
Author: YellowRoseCx <[email protected]>
Date:   Fri Jul 7 23:15:37 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 74e2703
Merge: cf65429 f9108ba
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 5 15:16:49 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit 7735c5a
Merge: c3e3733 7ee76e4
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 4 17:09:16 2023 +0300

    Merge 'origin/master' into hipblas

commit cf65429
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 3 16:56:40 2023 -0500

    print cuda or opencl based on what's used

commit 72c16d2
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 3 16:45:39 2023 -0500

    Revert "fix my mistake that broke other arches"

    This reverts commit 777aed5.

commit 777aed5
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 3 15:53:32 2023 -0500

    fix my mistake that broke other arches

commit 27780a9
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 16:03:27 2023 -0500

    rocm fixes

commit f52c7d4
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 16:02:58 2023 -0500

    Revert "rocm fixes"

    This reverts commit 2fe9927.

commit 2fe9927
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 15:58:21 2023 -0500

    rocm fixes

commit efe7560
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 15:55:43 2023 -0500

    Revert "move HIPBLAS definitions into ggml-cuda.h"

    This reverts commit bf49a93.

commit 4fc0181
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 15:55:36 2023 -0500

    Revert "move hipblas definitions to header files"

    This reverts commit 2741ffb.

commit 89eb576
Merge: 2741ffb 3d2907d
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 14:44:13 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit c3e3733
Author: Henri Vasserman <[email protected]>
Date:   Sun Jul 2 15:51:31 2023 +0300

    ROCm fixes

commit 15db19a
Merge: 04419f1 46088f7
Author: Henri Vasserman <[email protected]>
Date:   Sun Jul 2 15:39:57 2023 +0300

    Merge 'origin/master' into hipblas

commit 2741ffb
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 1 17:07:42 2023 -0500

    move hipblas definitions to header files

commit bf49a93
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 1 16:38:50 2023 -0500

    move HIPBLAS definitions into ggml-cuda.h

commit 540f4e0
Merge: 2c3b46f eda663f
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 1 14:58:32 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 2c3b46f
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 18:43:43 2023 -0500

    changes to fix build

commit c9e1103
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 18:20:07 2023 -0500

    Update ggml_v2-cuda-legacy.cu for ROCM

commit b858fc5
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 17:49:39 2023 -0500

    changes to work with upstream

commit 69a0c25
Merge: 096f0b0 1347d3a
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 16:59:06 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 04419f1
Merge: bb16eff d3494bb
Author: Henri Vasserman <[email protected]>
Date:   Wed Jun 28 23:30:10 2023 +0300

    Merge 'origin/master' into hipblas

commit bb16eff
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 28 15:27:10 2023 -0500

    headers fix; add kquants_iter for hipblas and add gfx803 (#1)

    * kquants_iter for hipblas and add gfx803
    * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16
    * remove dmmv_f16 for now

commit 096f0b0
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 28 15:27:02 2023 -0500

    revert unnecessary hipblas conditionals

commit d81e81a
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 28 14:48:23 2023 -0500

    Update Makefile hipblas nvcc correction

commit c8ae945
Merge: c1e5c83 0be54f7
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 27 10:50:37 2023 +0300

    Merge 'origin/master' into hipblas

commit 2579ecf
Merge: abed427 d2034ce
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 25 17:50:04 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit c1e5c83
Merge: 35a6031 447ccbe
Author: Henri Vasserman <[email protected]>
Date:   Sun Jun 25 21:40:05 2023 +0300

    Merge 'origin/master' into hipblas

commit 35a6031
Merge: df7346c 66a2555
Author: Henri Vasserman <[email protected]>
Date:   Sun Jun 25 10:57:48 2023 +0300

    Merge 'origin/master' into hipblas

commit abed427
Author: YellowRoseCx <[email protected]>
Date:   Sat Jun 24 19:16:30 2023 -0500

    reorganize If statements to include proper headers

commit 06c3bf0
Merge: ea6d320 8342fe8
Author: YellowRoseCx <[email protected]>
Date:   Sat Jun 24 16:57:20 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit ea6d320
Author: YellowRoseCx <[email protected]>
Date:   Fri Jun 23 01:53:28 2023 -0500

    Update README.md

commit 4d56ad8
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 22 16:19:43 2023 -0500

    Update README.md

commit 21f9308
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 22 15:42:05 2023 -0500

    kquants_iter for hipblas and add gfx803

commit df7346c
Merge: 5dd2fbe 7487137
Author: Henri Vasserman <[email protected]>
Date:   Thu Jun 22 20:51:09 2023 +0300

    Merge 'origin/master' into hipblas

commit b6ff890
Merge: eb094f0 e6ddb15
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 22 12:42:09 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit eb094f0
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 21 23:59:18 2023 -0500

    lowvram parameter description

commit 3a5dfeb
Merge: 665cc11 b1f00fa
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 21 16:53:03 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit 665cc11
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 21 01:13:19 2023 -0500

    add lowvram parameter

commit 222cbbb
Author: YellowRoseCx <[email protected]>
Date:   Tue Jun 20 19:03:28 2023 -0500

    add additional hipblas conditions for cublas

commit e1f9581
Author: YellowRoseCx <[email protected]>
Date:   Tue Jun 20 16:51:59 2023 -0500

    Add hip def for cuda v2

commit 3bff5c0
Merge: a7e74b3 266d47a
Author: YellowRoseCx <[email protected]>
Date:   Tue Jun 20 13:38:06 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit a7e74b3
Author: YellowRoseCx <[email protected]>
Date:   Mon Jun 19 22:04:18 2023 -0500

    Update README.md

commit 5e99b3c
Author: YellowRoseCx <[email protected]>
Date:   Mon Jun 19 22:03:42 2023 -0500

    Update Makefile

commit 9190b17
Author: YellowRoseCx <[email protected]>
Date:   Mon Jun 19 21:47:10 2023 -0500

    Update README.md

commit 5dd2fbe
Merge: 67e229b 20568fe
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 20 01:23:12 2023 +0300

    Merge 'origin/master' into hipblas

commit 2780ea2
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 15:48:00 2023 -0500

    Update Makefile

commit 04a3e64
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:33:39 2023 -0500

    remove extra line

commit cccbca9
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:31:17 2023 -0500

    attempt adding ROCM hipblas

commit a44a1d4
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:31:01 2023 -0500

    attempt adding ROCM hipblas

commit b088184
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:30:54 2023 -0500

    attempt adding ROCM hipblas

commit 67e229b
Merge: 6f7c156 b241649
Author: Henri Vasserman <[email protected]>
Date:   Sun Jun 18 00:36:54 2023 +0300

    Merge 'origin/master' into hipblas

commit 6f7c156
Merge: 61df8e9 fc45a81
Author: Henri Vasserman <[email protected]>
Date:   Sat Jun 17 16:53:22 2023 +0300

    Merge 'origin/master' into hipblas

commit 61df8e9
Author: Henri Vasserman <[email protected]>
Date:   Wed Jun 14 22:46:10 2023 +0300

    add cudaMemset

commit a836529
Merge: 85f902d 254a7a7
Author: Henri Vasserman <[email protected]>
Date:   Wed Jun 14 22:41:55 2023 +0300

    Merge 'origin/master' into hipblas

commit 85f902d
Merge: 4362e80 b50b570
Author: Henri Vasserman <[email protected]>
Date:   Thu Jun 8 10:50:28 2023 +0300

    Merge 'origin/master' into hipblas

commit 4362e80
Merge: fa5b3d7 17366df
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 23:14:40 2023 +0300

    Merge 'origin/master' into hipblas

commit fa5b3d7
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 18:47:00 2023 +0300

    fix makefile.

commit 1ba4ce4
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 18:41:08 2023 +0300

    Revert "warp size fixes"

    It seems like 32 is faster for me, at least and it won't cause so many conflicts.

    This reverts commit 5d6eb72.

commit 5d6eb72
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 18:32:41 2023 +0300

    warp size fixes

commit 33091a9
Merge: 9fdaa1d 2d43387
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 16:19:23 2023 +0300

    Merge  'origin/master' into hipblas

commit 9fdaa1d
Author: Henri Vasserman <[email protected]>
Date:   Sat May 27 19:17:53 2023 +0300

    Add more defs

    For forward compatibility ggerganov#1607

commit a4648c1
Merge: 4c8b3fb 0ecb1bb
Author: Henri Vasserman <[email protected]>
Date:   Sat May 27 18:22:39 2023 +0300

    Merge 'origin/master' into hipblas

commit 4c8b3fb
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 01:08:53 2023 +0300

    add configurable vars

commit 30d921a
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 01:03:56 2023 +0300

    and makefile

commit a593a4f
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 00:55:28 2023 +0300

    Add missing parameters

commit 174bf6a
Merge: f80ce7a 1fcdcc2
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 00:44:23 2023 +0300

    Merge 'origin/master' into hipblas

commit f80ce7a
Merge: 600ace3 ac7876a
Author: Henri Vasserman <[email protected]>
Date:   Thu May 25 00:02:50 2023 +0300

    Merge branch 'origin/master' into hipblas

commit 600ace3
Author: Henri Vasserman <[email protected]>
Date:   Sat May 20 23:42:20 2023 +0300

    update warp size

commit b19fefe
Author: Henri Vasserman <[email protected]>
Date:   Sat May 20 23:28:08 2023 +0300

    Forwardcompat

commit c66115b
Merge: a0b2d5f b8ee340
Author: Henri Vasserman <[email protected]>
Date:   Sat May 20 18:29:31 2023 +0300

    Merge 'origin/master' into hipblas

commit a0b2d5f
Merge: 8bab456 2a5ee02
Author: Henri Vasserman <[email protected]>
Date:   Tue May 16 17:08:29 2023 +0300

    Merge 'origin/master' into hipblas

commit 8bab456
Merge: 2956630 b5c9295
Author: Henri Vasserman <[email protected]>
Date:   Mon May 15 00:01:12 2023 +0300

    Merge 'origin/master' into hipblas

commit 2956630
Merge: 0fe6384 f048af0
Author: Henri Vasserman <[email protected]>
Date:   Sat May 13 13:12:52 2023 +0300

    Merge 'origin/master' into hipblas

commit 0fe6384
Author: Henri Vasserman <[email protected]>
Date:   Fri May 12 17:22:11 2023 +0300

    fix makefile

commit 605560d
Merge: 127f68e 089b1c9
Author: Henri Vasserman <[email protected]>
Date:   Fri May 12 16:12:53 2023 +0300

    Merge 'origin/master' into hipblas

commit 127f68e
Merge: 070cbcc b608b55
Author: Henri Vasserman <[email protected]>
Date:   Thu May 11 20:21:27 2023 +0300

    Merge 'origin/master' into hipblas

commit 070cbcc
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 18:10:56 2023 +0300

    occupanct function

commit a3296d5
Merge: 0aefa6a e129551
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 18:06:04 2023 +0300

    Merge 'origin/master' into hipblas

commit 0aefa6a
Merge: baeb482 1b0fd45
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 12:24:41 2023 +0300

    Merge 'origin/master' into hipblas

commit baeb482
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 12:24:12 2023 +0300

    Revert to default copy

commit 289073a
Merge: 1107194 173d0e6
Author: Henri Vasserman <[email protected]>
Date:   Sat May 6 19:59:41 2023 +0300

    Merge 'origin/master' into hipblas

commit 1107194
Merge: 04c0d48 a3b85b2
Author: Henri Vasserman <[email protected]>
Date:   Sat May 6 00:38:20 2023 +0300

    Merge 'origin/master' into hipblas

commit 04c0d48
Author: Henri Vasserman <[email protected]>
Date:   Thu May 4 12:31:16 2023 +0300

    Move all HIP stuff to ggml-cuda.cu

commit d83cfba
Merge: b67cc50 799fdc1
Author: Henri Vasserman <[email protected]>
Date:   Thu May 4 11:31:16 2023 +0300

    Merge 'origin/master' into hipblas

commit b67cc50
Merge: fcbc262 e216aa0
Author: Henri Vasserman <[email protected]>
Date:   Wed May 3 15:04:51 2023 +0300

    Merge 'origin/master' into hipblas

commit fcbc262
Merge: c73def1 f4cef87
Author: Henri Vasserman <[email protected]>
Date:   Mon May 1 22:45:29 2023 +0300

    Merge 'origin/master' into hipblas

commit c73def1
Merge: d8ea75e f0d70f1
Author: Henri Vasserman <[email protected]>
Date:   Sun Apr 30 18:40:42 2023 +0300

    Merge 'origin/master' into hipblas

commit d8ea75e
Merge: d194586 334637e
Author: Henri Vasserman <[email protected]>
Date:   Sat Apr 29 11:25:51 2023 +0300

    Merge 'origin/master' into hipblas

commit d194586
Merge: 2ab9d11 7f15c5c
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 23:03:52 2023 +0300

    Merge 'origin/master' into hipblas

commit 2ab9d11
Merge: 3b4a531 04aaae1
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 16:30:05 2023 +0300

    Merge 'origin/master' into hipblas

commit 3b4a531
Merge: a1caa48 0b2da20
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 10:08:41 2023 +0300

    Merge 'origin/master' into hipblas

commit a1caa48
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 10:08:21 2023 +0300

    add more cuda defines

    This is so 'slaren/cuda-f16f32' would merge.

commit ecc0565
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 01:58:27 2023 +0300

    only .cu file needs to be complied as device

commit ef51e9e
Merge: d571d16 4afcc37
Author: Henri Vasserman <[email protected]>
Date:   Wed Apr 26 12:46:26 2023 +0300

    Merge branch 'ggerganov:master' into hipblas

commit d571d16
Merge: 608aa33 dd0eabc
Author: Henri Vasserman <[email protected]>
Date:   Tue Apr 25 21:15:33 2023 +0300

    Merge 'origin/master' into hipblas

commit 608aa33
Author: Henri Vasserman <[email protected]>
Date:   Tue Apr 25 21:15:04 2023 +0300

    change default GPU arch to match CMake

commit 3a004b2
Author: Henri Vasserman <[email protected]>
Date:   Mon Apr 24 02:24:54 2023 +0300

    add rpath

commit db7a012
Merge: 3677235 284685f
Author: Henri Vasserman <[email protected]>
Date:   Sun Apr 23 21:49:28 2023 +0300

    Merge 'origin/master' into hipblas

commit 3677235
Author: Henri Vasserman <[email protected]>
Date:   Sat Apr 22 23:28:00 2023 +0300

    More build file changes

commit d3e1984
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 21 03:32:06 2023 +0300

    add rpath

commit 0e005f7
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 21 02:13:00 2023 +0300

    Build file changes

    Now HIP Clang is not required, the CMake scripts will configure the
    needed compiler, which can be system clang++. Also other code can
    still use GCC, but CMake will force the clang to link.

commit 54a63c1
Author: Henri Vasserman <[email protected]>
Date:   Thu Apr 20 22:19:22 2023 +0300

    Update Makefile for the Cuda kernels

commit 0fd8363
Author: Henri Vasserman <[email protected]>
Date:   Thu Apr 20 02:04:00 2023 +0300

    use hipblas based on cublas
LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Aug 28, 2023
* koboldcpp-ROCm Port

commit 3416c98
Merge: 5eb17f0 4c4e435
Author: YellowRoseCx <[email protected]>
Date:   Fri Aug 25 13:46:56 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 5eb17f0
Author: YellowRoseCx <[email protected]>
Date:   Fri Aug 25 13:38:21 2023 -0500

    ROCm Port update

    * use hipblas based on cublas
    * Update Makefile for the Cuda kernels
    * Expand arch list and make it overrideable
    * Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)
    * add hipBLAS to README
    * new build arg LLAMA_CUDA_MMQ_Y
    * fix half2 decomposition
    * Add intrinsics polyfills for AMD
    * AMD assembly optimized __dp4a
    * Allow overriding CC_TURING
    * use "ROCm" instead of "CUDA"
    * ignore all build dirs
    * Add Dockerfiles
    * fix llama-bench
    * fix -nommq help for non CUDA/HIP

    ---------

    Co-Authored-By: YellowRoseCx <[email protected]>
    Co-Authored-By: ardfork <[email protected]>
    Co-Authored-By: funnbot <[email protected]>
    Co-Authored-By: Engininja2 <[email protected]>
    Co-Authored-By: Kerfuffle <[email protected]>
    Co-Authored-By: jammm <[email protected]>
    Co-Authored-By: jdecourval <[email protected]>

commit b34f4bd
Author: YellowRoseCx <[email protected]>
Date:   Sat Aug 19 17:12:52 2023 -0500

    Update README.md

commit 7d11961
Author: YellowRoseCx <[email protected]>
Date:   Mon Aug 14 23:03:12 2023 -0500

    remove force DMMV

commit cd61aa0
Author: YellowRoseCx <[email protected]>
Date:   Sat Aug 12 17:24:31 2023 -0500

    restore main_gpu parameter

commit 4a042f3
Author: Henri Vasserman <[email protected]>
Date:   Sat Aug 12 10:51:46 2023 +0300

    gfx1100 support

    ---------

    Co-authored-by: ardfork <[email protected]>
    Co-authored-by: jammm <[email protected]>
    Co-authored-by: jdecourval <[email protected]>

commit 8913bc6
Author: Henri Vasserman <[email protected]>
Date:   Fri Aug 11 10:16:02 2023 +0300

    Allow overriding CC_TURING

commit e77a4c3
Author: Henri Vasserman <[email protected]>
Date:   Fri Aug 11 10:00:07 2023 +0300

    Merge 'origin/master' into hipblas

commit cc4c4e3
Author: Engininja2 <[email protected]>
Date:   Fri Aug 11 09:43:14 2023 +0300

    New __dp4a assembly

    Now compatible with gfx900 and faster as well.

commit 1a03b70
Author: Henri Vasserman <[email protected]>
Date:   Fri Aug 11 09:30:28 2023 +0300

    Undo mess

    ---------

    Co-authored-by: ardfork <[email protected]>

commit 4366ff9
Author: DannyDaemonic <[email protected]>
Date:   Thu Aug 10 13:11:36 2023 -0700

    Handle `ENABLE_VIRTUAL_TERMINAL_PROCESSING` more gracefully on earlier versions of Windows.

commit 811ff85
Author: Christian Demsar <[email protected]>
Date:   Thu Aug 10 10:28:27 2023 -0400

    Add --n-predict -2 for stopping generation on full context (ggerganov#2565)

commit 37c9717
Author: Martin Krasser <[email protected]>
Date:   Thu Aug 10 12:16:38 2023 +0200

    Fix grammar-based sampling issue in server (ggerganov#2566)

commit d18ecd5
Author: YellowRoseCx <[email protected]>
Date:   Thu Aug 10 13:19:41 2023 -0500

    make mmq gen faster for amd

commit 243894a
Author: Henri Vasserman <[email protected]>
Date:   Thu Aug 10 12:14:40 2023 +0300

    ws fix

commit ac2f14d
Author: Engininja2 <[email protected]>
Date:   Thu Aug 10 12:11:27 2023 +0300

    AMD assembly optimized __dp4a

    Doesn't seem to work for gfx900, so commented out.

commit 9dba0c9
Author: Henri Vasserman <[email protected]>
Date:   Thu Aug 10 12:09:28 2023 +0300

    Fix merge

    ---------

    Co-authored-by: ardfork <[email protected]>
    Co-authored-by: Kerfuffle <[email protected]>

commit f570b5c
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 22:11:20 2023 -0500

    Revert "revert cuda changes as they are bugggy"

    This reverts commit 1541bf8.

commit 1541bf8
Author: Concedo <[email protected]>
Date:   Wed Aug 9 22:36:41 2023 +0800

    revert cuda changes as they are bugggy

commit bacc202
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 20:37:17 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit b7cb4cf
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 20:00:52 2023 -0500

    additional fixes

commit fadae72
Merge: 518eb2a 8f8ab6c
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:45:50 2023 -0500

    Merge branch 'hipblas' into develop4Main

commit 518eb2a
Merge: bda0215 cae6a84
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:32:10 2023 -0500

    Merge remote-tracking branch 'upstream/concedo' into develop2Main

commit bda0215
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:17:54 2023 -0500

    update makefile to multisystem path

commit 8f8ab6c
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 9 18:05:03 2023 -0500

    hipLDFLAG Path change Unix to multisystem in Makefile

    changed the hardcoded linux distro hipblas LD path from -L/opt/rocm/lib to use the defined ROCM_PATH variable to be flexible with ROCm on non-Linux OS

commit 610ba4c
Merge: 4024f91 25d43e0
Author: Henri Vasserman <[email protected]>
Date:   Wed Aug 9 23:54:58 2023 +0300

    Merge 'origin/master' into hipblas

commit 4024f91
Author: Henri Vasserman <[email protected]>
Date:   Wed Aug 9 01:56:44 2023 +0300

    Add intrinsics polyfills for AMD

    ---------

    Co-authored-by: ardfork <[email protected]>
    Co-authored-by: funnbot <[email protected]>
    Co-authored-by: Engininja2 <[email protected]>

commit ab62128
Merge: d91456a f5bfea0
Author: Henri Vasserman <[email protected]>
Date:   Wed Aug 9 00:37:01 2023 +0300

    Merge 'origin/master' into hipblas

commit ee9fa2a
Author: YellowRoseCx <[email protected]>
Date:   Wed Aug 2 01:53:58 2023 -0500

    Update Makefile

commit d91456a
Author: ardfork <[email protected]>
Date:   Mon Jul 31 20:35:00 2023 +0300

    fix half2 decomposition

commit c1cb70d
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 31 19:56:44 2023 +0300

    new build arg LLAMA_CUDA_MMQ_Y

commit c1664a0
Merge: 4336231 0728c5a
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 31 19:32:27 2023 +0300

    Merge 'origin/master' into hipblas

commit 848558d
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 30 20:02:52 2023 -0500

    import vars logic fix

commit b650b84
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 30 00:21:36 2023 -0500

    Update easy_KCPP-ROCm_install.sh

commit 8573a67
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 21:31:12 2023 -0500

    remove duplicate code and fix typo

    remove duplicate tooltip

commit 430986e
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 21:07:34 2023 -0500

    hide "missing" if all are built

    move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available
    " if len(runopts)==6 else + "

commit dd0db72
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 20:52:31 2023 -0500

    hide "missing" if all are built

    move tooltip functions to helper functions section. hides the string "Missing: ..." from showing if all backends are available

commit 43fffb6
Merge: 0ed65a4 b40550c
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 19:13:15 2023 -0500

    Merge branch 'concedo'

commit 0ed65a4
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 18:34:21 2023 -0500

    Hide unavailable backends & Add tooltip over backend count

    Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

    Add tooltip when hovering over backend count label

    hovering over the new label that shows the backend count will explain what the numbers are, and show the users which backends are not available or built

commit 2a26398
Merge: cee2e9d 31486eb
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 29 15:16:33 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 4336231
Author: Henri Vasserman <[email protected]>
Date:   Sat Jul 29 18:35:56 2023 +0300

    add hipBLAS to README

    ---------

    Co-authored-by: ardfork <[email protected]>

commit f8e3fc6
Author: Henri Vasserman <[email protected]>
Date:   Sat Jul 29 14:16:46 2023 +0300

    rocblas init stuff

commit d2ade63
Merge: cde52d6 8a88e58
Author: Henri Vasserman <[email protected]>
Date:   Sat Jul 29 12:59:48 2023 +0300

    Merge 'origin/master' into hipblas

commit cee2e9d
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 26 23:36:55 2023 -0500

    Only Show Available Backends in GUI

    Hides unavailable backends from the user and if the program is launched without any backends made, it shows an error message to them stating no backends were found and to make them using the 'make' command

commit 7863610
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 26 13:27:22 2023 -0500

    Update easy_KCPP-ROCm_install.sh

commit 731cd6e
Author: YellowRoseCx <[email protected]>
Date:   Tue Jul 25 22:39:50 2023 -0500

    Create easy_rocm_install.sh

commit f154685
Merge: cbdc1f3 94e0a06
Author: YellowRoseCx <[email protected]>
Date:   Tue Jul 25 22:25:10 2023 -0500

    Merge branch 'concedo_experimentalMAIN'

commit cbdc1f3
Merge: 5b838d4 9731682
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 16:53:21 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit cde52d6
Merge: 8e8054a 84e09a7
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 24 12:22:58 2023 +0300

    Merge 'origin/master' into hipblas

commit 8e8054a
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 24 12:20:49 2023 +0300

    Add rocblas to build files

commit 1f6294d
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:52:01 2023 -0500

    Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)

    * initialize rocblas

commit 5b838d4
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:10:35 2023 -0500

    amd multigpu full layer offload w/o vram scratch

commit 9bfb2fd
Merge: b379f9d 66328fc
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:07:44 2023 -0500

    Merge branch 'concedo_experimental'

commit b379f9d
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 03:07:00 2023 -0500

    Revert "amd multigpu full layer offload w/o vram scratch"

    This reverts commit 9adfc8e.

commit 9adfc8e
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 02:56:40 2023 -0500

    amd multigpu full layer offload w/o vram scratch

commit 05c792e
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 24 00:18:48 2023 -0500

    initialize rocblas

commit ade68d0
Merge: 521ad6b 56995ca
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 23 20:25:05 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 521ad6b
Author: YellowRoseCx <[email protected]>
Date:   Thu Jul 20 21:42:33 2023 -0500

    lazy import_var error handling for saves

commit 9553e52
Merge: cac6650 f036109
Author: YellowRoseCx <[email protected]>
Date:   Thu Jul 20 19:59:41 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit cac6650
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 17 23:05:02 2023 -0500

    Makefile fix! Allows hip/clblast build together

commit 3db70b5
Merge: 2ec4466 7568d1a
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 18 01:54:17 2023 +0300

    Merge 'origin/master' into hipblas

commit f208670
Author: YellowRoseCx <[email protected]>
Date:   Fri Jul 14 02:56:03 2023 -0500

    improve error handling with gpu names

commit 860e738
Author: YellowRoseCx <[email protected]>
Date:   Fri Jul 14 00:33:03 2023 -0500

    Show GPU names in GUI, Only show GPUs that exist

    changed the pre-set 1,2,3 and 1,2,3,all settings that the GPU selector had and replaced them with a function that grabs the GPU names and sets the names as the values for the selector boxes.

commit 2ec4466
Author: Henri Vasserman <[email protected]>
Date:   Thu Jul 13 13:44:02 2023 +0300

    Update build flags.

    GGML_CUDA_DMMV_Y is now GGML_CUDA_MMV_Y
    so update your build instructions.

    GGML_CUDA_FORCE_DMMV is always enabled.

    ---------

    Co-authored-by: YellowRoseCx <[email protected]>

commit cd36b18
Merge: afcb8fe 1cbf561
Author: Henri Vasserman <[email protected]>
Date:   Thu Jul 13 13:03:01 2023 +0300

    Merge 'origin/master' into hipblas

commit ac7ebc3
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 18:32:18 2023 -0500

    add hipBLAS name scheme to GUI and update README

commit 7f85cc5
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 17:35:54 2023 -0500

    update makefile and ggml.c

commit 6ca3499
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 15:43:45 2023 -0500

    ggml.c fix

commit 770e674
Merge: 2b289cd 5941514
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 15:24:36 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 2b289cd
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 14:30:00 2023 -0500

    Update c-cpp.yml

commit 5dae95a
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 14:28:51 2023 -0500

    Update c-cpp.yml

commit b37cd73
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 12 14:27:04 2023 -0500

    Create c-cpp.yml to test Actions

commit afcb8fe
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 11 18:09:27 2023 +0300

    Add new config option

commit 8c2c497
Merge: e610466 2347463
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 11 17:53:54 2023 +0300

    Merge 'origin/master' into hipblas

commit e610466
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 11 17:53:14 2023 +0300

    Expand arch list and make it overrideable

commit 80e4e54
Merge: 7735c5a 1d16309
Author: Henri Vasserman <[email protected]>
Date:   Mon Jul 10 02:09:28 2023 +0300

    Merge 'origin/master' into hipblas

commit 8432e9d
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 9 16:55:30 2023 -0500

    Update Makefile

commit b58c189
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 9 16:20:00 2023 -0500

    Add multi-gpu CuBLAS support to new GUI

commit 0c1c71b
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 8 07:56:57 2023 -0500

    Update Makefile

commit f864f60
Author: Johannes Gäßler <[email protected]>
Date:   Sat Jul 8 00:25:15 2023 +0200

    CUDA: add __restrict__ to mul mat vec kernels (ggerganov#2140)

commit 4539bc2
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 8 01:36:14 2023 -0500

    update makefile for changes

commit 912e31e
Merge: 74e2703 ddaa4f2
Author: YellowRoseCx <[email protected]>
Date:   Fri Jul 7 23:15:37 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 74e2703
Merge: cf65429 f9108ba
Author: YellowRoseCx <[email protected]>
Date:   Wed Jul 5 15:16:49 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit 7735c5a
Merge: c3e3733 7ee76e4
Author: Henri Vasserman <[email protected]>
Date:   Tue Jul 4 17:09:16 2023 +0300

    Merge 'origin/master' into hipblas

commit cf65429
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 3 16:56:40 2023 -0500

    print cuda or opencl based on what's used

commit 72c16d2
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 3 16:45:39 2023 -0500

    Revert "fix my mistake that broke other arches"

    This reverts commit 777aed5.

commit 777aed5
Author: YellowRoseCx <[email protected]>
Date:   Mon Jul 3 15:53:32 2023 -0500

    fix my mistake that broke other arches

commit 27780a9
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 16:03:27 2023 -0500

    rocm fixes

commit f52c7d4
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 16:02:58 2023 -0500

    Revert "rocm fixes"

    This reverts commit 2fe9927.

commit 2fe9927
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 15:58:21 2023 -0500

    rocm fixes

commit efe7560
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 15:55:43 2023 -0500

    Revert "move HIPBLAS definitions into ggml-cuda.h"

    This reverts commit bf49a93.

commit 4fc0181
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 15:55:36 2023 -0500

    Revert "move hipblas definitions to header files"

    This reverts commit 2741ffb.

commit 89eb576
Merge: 2741ffb 3d2907d
Author: YellowRoseCx <[email protected]>
Date:   Sun Jul 2 14:44:13 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit c3e3733
Author: Henri Vasserman <[email protected]>
Date:   Sun Jul 2 15:51:31 2023 +0300

    ROCm fixes

commit 15db19a
Merge: 04419f1 46088f7
Author: Henri Vasserman <[email protected]>
Date:   Sun Jul 2 15:39:57 2023 +0300

    Merge 'origin/master' into hipblas

commit 2741ffb
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 1 17:07:42 2023 -0500

    move hipblas definitions to header files

commit bf49a93
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 1 16:38:50 2023 -0500

    move HIPBLAS definitions into ggml-cuda.h

commit 540f4e0
Merge: 2c3b46f eda663f
Author: YellowRoseCx <[email protected]>
Date:   Sat Jul 1 14:58:32 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 2c3b46f
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 18:43:43 2023 -0500

    changes to fix build

commit c9e1103
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 18:20:07 2023 -0500

    Update ggml_v2-cuda-legacy.cu for ROCM

commit b858fc5
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 17:49:39 2023 -0500

    changes to work with upstream

commit 69a0c25
Merge: 096f0b0 1347d3a
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 29 16:59:06 2023 -0500

    Merge remote-tracking branch 'upstream/concedo'

commit 04419f1
Merge: bb16eff d3494bb
Author: Henri Vasserman <[email protected]>
Date:   Wed Jun 28 23:30:10 2023 +0300

    Merge 'origin/master' into hipblas

commit bb16eff
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 28 15:27:10 2023 -0500

    headers fix; add kquants_iter for hipblas and add gfx803 (#1)

    * kquants_iter for hipblas and add gfx803
    * Update CMakeLists.txt with hipblas kquants_iter and DMMV_F16
    * remove dmmv_f16 for now

commit 096f0b0
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 28 15:27:02 2023 -0500

    revert unnecessary hipblas conditionals

commit d81e81a
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 28 14:48:23 2023 -0500

    Update Makefile hipblas nvcc correction

commit c8ae945
Merge: c1e5c83 0be54f7
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 27 10:50:37 2023 +0300

    Merge 'origin/master' into hipblas

commit 2579ecf
Merge: abed427 d2034ce
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 25 17:50:04 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit c1e5c83
Merge: 35a6031 447ccbe
Author: Henri Vasserman <[email protected]>
Date:   Sun Jun 25 21:40:05 2023 +0300

    Merge 'origin/master' into hipblas

commit 35a6031
Merge: df7346c 66a2555
Author: Henri Vasserman <[email protected]>
Date:   Sun Jun 25 10:57:48 2023 +0300

    Merge 'origin/master' into hipblas

commit abed427
Author: YellowRoseCx <[email protected]>
Date:   Sat Jun 24 19:16:30 2023 -0500

    reorganize If statements to include proper headers

commit 06c3bf0
Merge: ea6d320 8342fe8
Author: YellowRoseCx <[email protected]>
Date:   Sat Jun 24 16:57:20 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit ea6d320
Author: YellowRoseCx <[email protected]>
Date:   Fri Jun 23 01:53:28 2023 -0500

    Update README.md

commit 4d56ad8
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 22 16:19:43 2023 -0500

    Update README.md

commit 21f9308
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 22 15:42:05 2023 -0500

    kquants_iter for hipblas and add gfx803

commit df7346c
Merge: 5dd2fbe 7487137
Author: Henri Vasserman <[email protected]>
Date:   Thu Jun 22 20:51:09 2023 +0300

    Merge 'origin/master' into hipblas

commit b6ff890
Merge: eb094f0 e6ddb15
Author: YellowRoseCx <[email protected]>
Date:   Thu Jun 22 12:42:09 2023 -0500

    Merge branch 'LostRuins:concedo' into main

commit eb094f0
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 21 23:59:18 2023 -0500

    lowvram parameter description

commit 3a5dfeb
Merge: 665cc11 b1f00fa
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 21 16:53:03 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit 665cc11
Author: YellowRoseCx <[email protected]>
Date:   Wed Jun 21 01:13:19 2023 -0500

    add lowvram parameter

commit 222cbbb
Author: YellowRoseCx <[email protected]>
Date:   Tue Jun 20 19:03:28 2023 -0500

    add additional hipblas conditions for cublas

commit e1f9581
Author: YellowRoseCx <[email protected]>
Date:   Tue Jun 20 16:51:59 2023 -0500

    Add hip def for cuda v2

commit 3bff5c0
Merge: a7e74b3 266d47a
Author: YellowRoseCx <[email protected]>
Date:   Tue Jun 20 13:38:06 2023 -0500

    Merge branch 'LostRuins:concedo' into koboldcpp-rocm

commit a7e74b3
Author: YellowRoseCx <[email protected]>
Date:   Mon Jun 19 22:04:18 2023 -0500

    Update README.md

commit 5e99b3c
Author: YellowRoseCx <[email protected]>
Date:   Mon Jun 19 22:03:42 2023 -0500

    Update Makefile

commit 9190b17
Author: YellowRoseCx <[email protected]>
Date:   Mon Jun 19 21:47:10 2023 -0500

    Update README.md

commit 5dd2fbe
Merge: 67e229b 20568fe
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 20 01:23:12 2023 +0300

    Merge 'origin/master' into hipblas

commit 2780ea2
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 15:48:00 2023 -0500

    Update Makefile

commit 04a3e64
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:33:39 2023 -0500

    remove extra line

commit cccbca9
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:31:17 2023 -0500

    attempt adding ROCM hipblas

commit a44a1d4
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:31:01 2023 -0500

    attempt adding ROCM hipblas

commit b088184
Author: YellowRoseCx <[email protected]>
Date:   Sun Jun 18 14:30:54 2023 -0500

    attempt adding ROCM hipblas

commit 67e229b
Merge: 6f7c156 b241649
Author: Henri Vasserman <[email protected]>
Date:   Sun Jun 18 00:36:54 2023 +0300

    Merge 'origin/master' into hipblas

commit 6f7c156
Merge: 61df8e9 fc45a81
Author: Henri Vasserman <[email protected]>
Date:   Sat Jun 17 16:53:22 2023 +0300

    Merge 'origin/master' into hipblas

commit 61df8e9
Author: Henri Vasserman <[email protected]>
Date:   Wed Jun 14 22:46:10 2023 +0300

    add cudaMemset

commit a836529
Merge: 85f902d 254a7a7
Author: Henri Vasserman <[email protected]>
Date:   Wed Jun 14 22:41:55 2023 +0300

    Merge 'origin/master' into hipblas

commit 85f902d
Merge: 4362e80 b50b570
Author: Henri Vasserman <[email protected]>
Date:   Thu Jun 8 10:50:28 2023 +0300

    Merge 'origin/master' into hipblas

commit 4362e80
Merge: fa5b3d7 17366df
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 23:14:40 2023 +0300

    Merge 'origin/master' into hipblas

commit fa5b3d7
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 18:47:00 2023 +0300

    fix makefile.

commit 1ba4ce4
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 18:41:08 2023 +0300

    Revert "warp size fixes"

    It seems like 32 is faster for me, at least and it won't cause so many conflicts.

    This reverts commit 5d6eb72.

commit 5d6eb72
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 18:32:41 2023 +0300

    warp size fixes

commit 33091a9
Merge: 9fdaa1d 2d43387
Author: Henri Vasserman <[email protected]>
Date:   Tue Jun 6 16:19:23 2023 +0300

    Merge  'origin/master' into hipblas

commit 9fdaa1d
Author: Henri Vasserman <[email protected]>
Date:   Sat May 27 19:17:53 2023 +0300

    Add more defs

    For forward compatibility ggerganov#1607

commit a4648c1
Merge: 4c8b3fb 0ecb1bb
Author: Henri Vasserman <[email protected]>
Date:   Sat May 27 18:22:39 2023 +0300

    Merge 'origin/master' into hipblas

commit 4c8b3fb
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 01:08:53 2023 +0300

    add configurable vars

commit 30d921a
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 01:03:56 2023 +0300

    and makefile

commit a593a4f
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 00:55:28 2023 +0300

    Add missing parameters

commit 174bf6a
Merge: f80ce7a 1fcdcc2
Author: Henri Vasserman <[email protected]>
Date:   Fri May 26 00:44:23 2023 +0300

    Merge 'origin/master' into hipblas

commit f80ce7a
Merge: 600ace3 ac7876a
Author: Henri Vasserman <[email protected]>
Date:   Thu May 25 00:02:50 2023 +0300

    Merge branch 'origin/master' into hipblas

commit 600ace3
Author: Henri Vasserman <[email protected]>
Date:   Sat May 20 23:42:20 2023 +0300

    update warp size

commit b19fefe
Author: Henri Vasserman <[email protected]>
Date:   Sat May 20 23:28:08 2023 +0300

    Forwardcompat

commit c66115b
Merge: a0b2d5f b8ee340
Author: Henri Vasserman <[email protected]>
Date:   Sat May 20 18:29:31 2023 +0300

    Merge 'origin/master' into hipblas

commit a0b2d5f
Merge: 8bab456 2a5ee02
Author: Henri Vasserman <[email protected]>
Date:   Tue May 16 17:08:29 2023 +0300

    Merge 'origin/master' into hipblas

commit 8bab456
Merge: 2956630 b5c9295
Author: Henri Vasserman <[email protected]>
Date:   Mon May 15 00:01:12 2023 +0300

    Merge 'origin/master' into hipblas

commit 2956630
Merge: 0fe6384 f048af0
Author: Henri Vasserman <[email protected]>
Date:   Sat May 13 13:12:52 2023 +0300

    Merge 'origin/master' into hipblas

commit 0fe6384
Author: Henri Vasserman <[email protected]>
Date:   Fri May 12 17:22:11 2023 +0300

    fix makefile

commit 605560d
Merge: 127f68e 089b1c9
Author: Henri Vasserman <[email protected]>
Date:   Fri May 12 16:12:53 2023 +0300

    Merge 'origin/master' into hipblas

commit 127f68e
Merge: 070cbcc b608b55
Author: Henri Vasserman <[email protected]>
Date:   Thu May 11 20:21:27 2023 +0300

    Merge 'origin/master' into hipblas

commit 070cbcc
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 18:10:56 2023 +0300

    occupanct function

commit a3296d5
Merge: 0aefa6a e129551
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 18:06:04 2023 +0300

    Merge 'origin/master' into hipblas

commit 0aefa6a
Merge: baeb482 1b0fd45
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 12:24:41 2023 +0300

    Merge 'origin/master' into hipblas

commit baeb482
Author: Henri Vasserman <[email protected]>
Date:   Sun May 7 12:24:12 2023 +0300

    Revert to default copy

commit 289073a
Merge: 1107194 173d0e6
Author: Henri Vasserman <[email protected]>
Date:   Sat May 6 19:59:41 2023 +0300

    Merge 'origin/master' into hipblas

commit 1107194
Merge: 04c0d48 a3b85b2
Author: Henri Vasserman <[email protected]>
Date:   Sat May 6 00:38:20 2023 +0300

    Merge 'origin/master' into hipblas

commit 04c0d48
Author: Henri Vasserman <[email protected]>
Date:   Thu May 4 12:31:16 2023 +0300

    Move all HIP stuff to ggml-cuda.cu

commit d83cfba
Merge: b67cc50 799fdc1
Author: Henri Vasserman <[email protected]>
Date:   Thu May 4 11:31:16 2023 +0300

    Merge 'origin/master' into hipblas

commit b67cc50
Merge: fcbc262 e216aa0
Author: Henri Vasserman <[email protected]>
Date:   Wed May 3 15:04:51 2023 +0300

    Merge 'origin/master' into hipblas

commit fcbc262
Merge: c73def1 f4cef87
Author: Henri Vasserman <[email protected]>
Date:   Mon May 1 22:45:29 2023 +0300

    Merge 'origin/master' into hipblas

commit c73def1
Merge: d8ea75e f0d70f1
Author: Henri Vasserman <[email protected]>
Date:   Sun Apr 30 18:40:42 2023 +0300

    Merge 'origin/master' into hipblas

commit d8ea75e
Merge: d194586 334637e
Author: Henri Vasserman <[email protected]>
Date:   Sat Apr 29 11:25:51 2023 +0300

    Merge 'origin/master' into hipblas

commit d194586
Merge: 2ab9d11 7f15c5c
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 23:03:52 2023 +0300

    Merge 'origin/master' into hipblas

commit 2ab9d11
Merge: 3b4a531 04aaae1
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 16:30:05 2023 +0300

    Merge 'origin/master' into hipblas

commit 3b4a531
Merge: a1caa48 0b2da20
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 10:08:41 2023 +0300

    Merge 'origin/master' into hipblas

commit a1caa48
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 10:08:21 2023 +0300

    add more cuda defines

    This is so 'slaren/cuda-f16f32' would merge.

commit ecc0565
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 28 01:58:27 2023 +0300

    only .cu file needs to be complied as device

commit ef51e9e
Merge: d571d16 4afcc37
Author: Henri Vasserman <[email protected]>
Date:   Wed Apr 26 12:46:26 2023 +0300

    Merge branch 'ggerganov:master' into hipblas

commit d571d16
Merge: 608aa33 dd0eabc
Author: Henri Vasserman <[email protected]>
Date:   Tue Apr 25 21:15:33 2023 +0300

    Merge 'origin/master' into hipblas

commit 608aa33
Author: Henri Vasserman <[email protected]>
Date:   Tue Apr 25 21:15:04 2023 +0300

    change default GPU arch to match CMake

commit 3a004b2
Author: Henri Vasserman <[email protected]>
Date:   Mon Apr 24 02:24:54 2023 +0300

    add rpath

commit db7a012
Merge: 3677235 284685f
Author: Henri Vasserman <[email protected]>
Date:   Sun Apr 23 21:49:28 2023 +0300

    Merge 'origin/master' into hipblas

commit 3677235
Author: Henri Vasserman <[email protected]>
Date:   Sat Apr 22 23:28:00 2023 +0300

    More build file changes

commit d3e1984
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 21 03:32:06 2023 +0300

    add rpath

commit 0e005f7
Author: Henri Vasserman <[email protected]>
Date:   Fri Apr 21 02:13:00 2023 +0300

    Build file changes

    Now HIP Clang is not required, the CMake scripts will configure the
    needed compiler, which can be system clang++. Also other code can
    still use GCC, but CMake will force the clang to link.

commit 54a63c1
Author: Henri Vasserman <[email protected]>
Date:   Thu Apr 20 22:19:22 2023 +0300

    Update Makefile for the Cuda kernels

commit 0fd8363
Author: Henri Vasserman <[email protected]>
Date:   Thu Apr 20 02:04:00 2023 +0300

    use hipblas based on cublas

* Merge Fixes

* readme merge fix

* remove old ggmlv2 changes

* bring ggml v2_cuda up to date with AMD changes

* Revert ggml v2_cuda changes BC they werent needed

This reverts commit 3385dd4.

* avoid launching subprocesses to get device names for now, but other than that seems to be working

---------

Co-authored-by: Concedo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hardware Hardware related high priority Very important issue performance Speed related topics refactoring Refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[User] How to specify which cuda device to use programmably