2025-02-02 게시 됨2025-03-06 업데이트 됨AI / LLM15분안에 읽기 (약 2177 단어)

Local LLM 실험: RTX 3080Ti 에서 bench mark 결과

RTX 3080TI 를 사용해서 LLM 모델 llama-bench 로 벤치마크 테스트를 수행했다.

llama-3-korean-bllossom-8B
llama-3.1-korean-reasoning-8B
UNIVA-Deepseek-llama3.1-Bllossom-8B
Deepseek-r1-distill-llama-8B
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill-Qwen-32B

벤치마크 결과는 아래 테이블 같이 나온다.

model	size	params	backend	ngl	test	t/s
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	99	pp512	3730.08 ± 65.93
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	99	tg1000	91.75 ± 1.07

컬럼의 의미는 다음 같다.

Prompt processing (pp): processing a prompt in batches (-p)
Text generation (tg) :
n-gpu-layers (ngl) : GPU offload layers

llama-3-Korean-Bllossom-8B-Q4_K_M.gguf

8B 파라미터 크기를 가진 Llama3 fintuned 모델

MLP-KTLim/llama-3-Korean-Bllossom-8B-Q4_K_M.gguf

ngl 을 변경하며 벤치마킹,

1	llama-bench -m llama-3-Korean-Bllossom-8B-Q4_K_M.gguf -ngl 10,20,30,40,50 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	10	pp512	1303.36 ± 16.36
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	10	tg1000	10.85 ± 0.02
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	20	pp512	1719.75 ± 69.73
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	20	tg1000	16.87 ± 0.04
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	30	pp512	2906.49 ± 23.43
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	30	tg1000	39.91 ± 0.16
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	40	pp512	3483.66 ± 259.95
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	40	tg1000	89.85 ± 2.06
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	50	pp512	3419.22 ± 348.84
llama 8B Q4_K - Medium	4.58 GiB	8.03 B	CUDA	50	tg1000	89.79 ± 0.37

정리:

RTX3080TI 에서 ngl=40 개 정도에서 꽤, 쓸만하게 반응한다. (시간적으로)

lemon-mint/LLaMa-3.1-Korean-Reasoning-8B-Instruct-Q8

llama3.1-8B 는 32 layers 를 가진 모델이다.

여기서는 lemon-mint/llama-3.1-korean-reasoning-8b-instruct-q8_0.gguf 모델을 사용ㅇ했다.

1	llama-bench -m Bllossom/lemon-mint/llama-3.1-korean-reasoning-8b-instruct-q8_0.gguf -ngl 25,30,35,40,45

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	25	pp512	1784.23 ± 93.34
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	25	tg1000	14.80 ± 0.06
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	30	pp512	2786.34 ± 31.32
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	30	tg1000	26.87 ± 0.30
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	35	pp512	3733.38 ± 187.10
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	35	tg1000	73.87 ± 3.13
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	40	pp512	3797.38 ± 166.76
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	40	tg1000	74.09 ± 3.33
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	45	pp512	3791.58 ± 82.35
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	45	tg1000	74.12 ± 3.20

정리

RTX3080TI 는 8B 모델들은 적당히 잘 돌아 간다.
bllossom 8B 와 비슷하게 ngl=40 가 적당하다.

UNIVA-Deepseek-llama3.1-Bllossom-8B

DeepSeek-Bllossom Series는 기존 DeepSeek-R1-Distill Series 모델의 language mixing, 다국어 성능 저하 문제를 해결하기 위해 추가로 학습된 모델입니다.

DeepSeek-llama3.1-Bllossom-8B는 DeepSeek-R1-distill-Llama-8B 모델을 베이스로 구축된 모델로, 한국어 환경에서의 추론 성능 향상을 목표로 개발되었습니다.

mradermacher/DeepSeek-R1-Distill-Llama-8B_korean_reasoning-GGUF

6Bit

1	llama-bench -m UNIVA-DeepSeek-llama3.1-Bllossom-8B-Q6_K.gguf -ngl 20,23,25,27,30 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	20	pp512	1543.16 ± 24.32
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	20	tg1000	13.13 ± 0.11
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	23	pp512	1765.23 ± 58.73
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	23	tg1000	16.08 ± 0.07
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	25	pp512	2027.43 ± 43.47
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	25	tg1000	19.04 ± 0.30
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	27	pp512	2249.32 ± 57.11
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	27	tg1000	23.01 ± 0.82
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	30	pp512	3001.55 ± 29.89
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	30	tg1000	33.67 ± 0.20

1	(Deepseek_R1) qkboo:~$ llama-bench -m /mnt/e/LLM_Run/UNIVA-DeepSeek-llama3.1-Bllossom-8B-Q6_K.gguf -ngl 30,33,35,37,40 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	30	pp512	3011.60 ± 50.04
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	30	tg1000	34.08 ± 1.11
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	33	pp512	3895.08 ± 25.09
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	33	tg1000	76.81 ± 4.94
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	35	pp512	3933.71 ± 32.81
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	35	tg1000	77.27 ± 6.96
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	37	pp512	3883.86 ± 20.62
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	37	tg1000	77.30 ± 4.44
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	40	pp512	3909.77 ± 14.13
^[[C	llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	40	tg1000

8bit

1	$ llama-bench -m UNIVA-DeepSeek-llama3.1-Bllossom-8B-Q8_0.gguf -ngl 17,23,27,30,33 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	17	pp512	1152.58 ± 20.30
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	17	tg1000	8.79 ± 0.06
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	23	pp512	1653.79 ± 44.44
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	23	tg1000	12.79 ± 0.08
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	27	pp512	2170.69 ± 66.22
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	27	tg1000	18.02 ± 0.10
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	30	pp512	2997.54 ± 36.25
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	30	tg1000	26.93 ± 0.28
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	33	pp512	4311.76 ± 17.63
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	33	tg1000	80.54 ± 2.72

1	$ llama-bench -m /mnt/e/LLM_Run/UNIVA-DeepSeek-llama3.1-Bllossom-8B-Q8_0.gguf -ngl 47,53,57,60,65 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	47	pp512	4252.55 ± 170.94
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	47	tg1000	79.03 ± 8.48
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	53	pp512	4341.45 ± 181.79
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	53	tg1000	80.21 ± 8.60
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	57	pp512	4470.11 ± 27.91
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	57	tg1000	80.12 ± 6.18
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	60	pp512	4542.52 ± 23.46
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	60	tg1000	80.92 ± 9.37
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	65	pp512	4502.80 ± 57.29
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	65	tg1000	81.02 ± 10.89

DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf

그 유명한 deepseek r1 으로 unsloth 의 distill 버전을 사용했다.

unsloth.ai/blog/deepseek-r1
https://unsloth.ai/blog/deepseekr1-dynamic
https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF

deepseek r1 은 61 layers 를 사용한다.

1 2	$ llama-bench -m DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf -ngl 10,20,30 ,40,50 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	10	pp512	849.57 ± 12.77
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	10	tg1000	6.34 ± 0.06
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	20	pp512	1279.56 ± 22.85
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	20	tg1000	10.41 ± 0.08
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	30	pp512	2712.69 ± 96.48
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	30	tg1000	26.45 ± 0.42
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	40	pp512	3581.72 ± 261.82
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	40	tg1000	72.33 ± 1.53
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	50	pp512	3653.35 ± 292.75
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	50	tg1000	73.69 ± 2.39

정리

RTX3080TI 에서 ngl=40 에서 잘 반응한다.
역시 8B 파라미터라서 앞의 llama 3- bllossom, llama-3.1 8B 모델과 비슷하다.

DeepSeek-R1-Distill-Llama-8B_korean_reasoning

https://huggingface.co/mradermacher/DeepSeek-R1-Distill-Llama-8B_korean_reasoning-GGUF

6bit

1	$ llama-bench -m DeepSeek_R1_Distill/Llama-8B/DeepSeek-R1-Distill-Llama-8B_korean_reasoning.Q6_K.gguf -ngl 17,25,30,35,40,45 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	17	pp512	1420.67 ± 56.23
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	17	tg1000	10.87 ± 0.45
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	25	pp512	2126.29 ± 80.18
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	25	tg1000	18.29 ± 0.83
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	30	pp512	3136.95 ± 97.13
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	30	tg1000	33.18 ± 1.54
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	37	pp512	3670.82 ± 41.77
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	37	tg1000	77.20 ± 1.17
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	40	pp512	3711.66 ± 33.40
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	40	tg1000	77.59 ± 1.12
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	42	pp512	3725.29 ± 18.83
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	42	tg1000	77.39 ± 1.52
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	45	pp512	3690.92 ± 26.38
llama 8B Q6_K	6.14 GiB	8.03 B	CUDA	45	tg1000	77.49 ± 1.37

RTX3080ti 에서

pp 는 bllossom 버전보다 꽤 빠르다고 생각된다
tg 는 유사하다.
ngl은 40 정도가 적당할 듯.

8bit

l$ llama-bench -m DeepSeek_R1_Distill/Llama-8B/DeepSeek-R1-Distill-Llama-8B_korean_reasoning.Q8_0.gguf -ngl 25,29,35,39,42 -n 1000
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	25	pp512	1811.10 ± 51.70
llama 8B Q8_0	7.95 GiB	8.03 B	CUDA	25	tg1000	14.25 ± 0.66
^C

RTX3080ti 에서

DeepSeek-R1-Distill-Qwen-14B

8bit Quantitization

1	$ llama-bench -m DeepSeek_R1_Distill/unsloth/DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf -ngl 25,28,30,33,35 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
qwen2 14B Q8_0	14.62 GiB	14.77 B	CUDA	25	pp512	649.52 ± 7.97
qwen2 14B Q8_0	14.62 GiB	14.77 B	CUDA	25	tg1000	4.73 ± 0.03
qwen2 14B Q8_0	14.62 GiB	14.77 B	CUDA	28	pp512	593.29 ± 188.35

6bit Quantitization

1	$ llama-bench -m DeepSeek_R1_Distill/unsloth/DeepSeek-R1-Distill-Qwen-14B-Q6_K.gguf -ngl 15,18,20,25,30 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
qwen2 14B Q6_K	11.29 GiB	14.77 B	CUDA	15	pp512	490.09 ± 191.57
qwen2 14B Q6_K	11.29 GiB	14.77 B	CUDA	15	tg1000	4.52 ± 0.04
qwen2 14B Q6_K	11.29 GiB	14.77 B	CUDA	18	pp512	629.45 ± 14.33
qwen2 14B Q6_K	11.29 GiB	14.77 B	CUDA	18	tg1000	4.93 ± 0.03
qwen2 14B Q6_K	11.29 GiB	14.77 B	CUDA	20	pp512	685.08 ± 14.48
qwen2 14B Q6_K	11.29 GiB	14.77 B	CUDA	25	pp512	787.79 ± 18.55

$ llama-bench -m DeepSeek_R1_Distill/unsloth/DeepSeek-R1-Distill-Qwen-14B-Q5_K_M.gguf -ngl 20,25
,30,35 -n 1000
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
qwen2 14B Q5_K - Medium	9.78 GiB	14.77 B	CUDA	20	pp512	735.40 ± 7.36
qwen2 14B Q5_K - Medium	9.78 GiB	14.77 B	CUDA	20	tg1000	5.74 ± 0.12
qwen2 14B Q5_K - Medium	9.78 GiB	14.77 B	CUDA	25	pp512	829.91 ± 7.98
qwen2 14B Q5_K - Medium	9.78 GiB	14.77 B	CUDA	25	tg1000	6.77 ± 0.16
^C

DeepSeek-R1-Distill-Qwen-32B

320만개 파라미터를 가진 Deepssek 와 Qwen-32B 를 distill 한 버전이다.

unsloth/DeepSeek-R1-Distill-Qwen-32B-Q3
unsloth/DeepSeek-R1-Distill-Qwen-32B-Q2

DeepSeek-R1-Distill-Qwen-32B-Q3_K_M.gguf

1	$ llama-bench -m unsloth/DeepSeek-R1-Distill-Qwen-32B-Q3_K_M.gguf -ngl 27,30,33,35 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	27	pp512	392.53 ± 2.94
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	27	tg1000	3.76 ± 0.02
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	30	pp512	411.41 ± 4.29
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	30	tg1000	4.04 ± 0.02
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	33	pp512	362.17 ± 93.15
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	33	tg1000	4.11 ± 0.01
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	35	pp512	427.65 ± 24.95
qwen2 32B Q3_K - Medium	14.84 GiB	32.76 B	CUDA	35	tg1000	4.44 ± 0.05

정리

8B 모델 대비 확실히 1/10로 반응시간이 느려졌다. 그래서 실행해서 프롬프트 테스트하기 버겁다.
ngl 당 걸린 시간이 너무 오래 걸린다. 측정을 못했지만 20분 이상 걸리는 것 같다.

DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf

1	$ llama-bench -m DeepSeek_R1_Distill/unsloth/DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf -ngl 25,28,30,33,35 -n 1000

Device 0: NVIDIA GeForce RTX 3080 Ti, compute capability 8.6, VMM: yes

model	size	params	backend	ngl	test	t/s
qwen2 32B Q2_K - Medium	11.46 GiB	32.76 B	CUDA	25	pp512	360.50 ± 104.74
qwen2 32B Q2_K - Medium	11.46 GiB	32.76 B	CUDA	25	tg1000	4.49 ± 0.07
qwen2 32B Q2_K - Medium	11.46 GiB	32.76 B	CUDA	28	pp512	422.67 ± 6.60
qwen2 32B Q2_K - Medium	11.46 GiB	32.76 B	CUDA	28	tg1000	4.83 ± 0.03
qwen2 32B Q2_K - Medium	11.46 GiB	32.76 B	CUDA	30	pp512	466.25 ± 3.99

정리

320만개 파라미터는 RTX3080TI 에서 무리이다.
80만개 파라미터를 가진 모델은 돌릴만 하다.

RTX 3080Ti 에서 적절한 ngl 표.

model	size	params	backend	ngl	test	t/s
llama3-Korean-Bllossom-8B-Q4_K	4.58 GiB	8.03 B	CUDA	40	pp512	3483.66 ± 259.95
llama3-Korean-Bllossom-8B-Q4_K	4.58 GiB	8.03 B	CUDA	40	tg1000	89.85 ± 2.06
llama-3.1-korean-reasoning-8b-instruct-q8_0	7.95 GiB	8.03 B	CUDA	35	pp512	3733.38 ± 187.10
llama-3.1-korean-reasoning-8b-instruct-q8_0	7.95 GiB	8.03 B	CUDA	35	tg1000	73.87 ± 3.13

Local LLM 실험: RTX 3080Ti 에서 bench mark 결과

https://thinkbee.github.io/llama-model-bench-rtx3080ti-e1cdcf8d9cee/

Author

Gangtai Goh

Posted on

2025-02-02

Updated on

2025-03-06

Licensed under

커피 한 잔 사주기 Patreon

Local LLM 실험: RTX 3080Ti 에서 bench mark 결과

llama-3-Korean-Bllossom-8B-Q4_K_M.gguf

lemon-mint/LLaMa-3.1-Korean-Reasoning-8B-Instruct-Q8

UNIVA-Deepseek-llama3.1-Bllossom-8B

6Bit

8bit

DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf

DeepSeek-R1-Distill-Llama-8B_korean_reasoning

6bit

8bit

RTX3080ti 에서

DeepSeek-R1-Distill-Qwen-14B

DeepSeek-R1-Distill-Qwen-32B

DeepSeek-R1-Distill-Qwen-32B-Q3_K_M.gguf

DeepSeek-R1-Distill-Qwen-32B-Q2_K.gguf

정리

Author

Posted on

Updated on

Licensed under

이 글이 마음에 드시나요? 다음을 통해 후원하실 수 있습니다:

댓글

최근 글

광고