V2EX › tubanwu 的所有回复 › 第 1 页 / 共 14 页

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

For Existing Member Sign In

1 2 3 4 5 6 7 8 9 10 ... 14

❮

❯

7 天前

回复了 Ai2You 创建的主题 › 推广 › 抽 5 个智友社的 GPTPLUS 稳定渠道成品号（满月测试稳定度 99% ）

+1 每月 60

8 天前

回复了 TGOcc 创建的主题 › Mac mini › Air M5 32G 跑本地模型快冒烟了， 6 月没发新硬件， 9 月会发 MAC mini M5 或 M5 Pro 吗？有没有会冲的。

买个二手的 M1MAX 64g 的玩玩得了

23 天前

回复了 Brightt 创建的主题 › iPhone › 你们的 iPhone 相机键还好吗？

经常用还没坏

5 月 21 日

回复了 memos 创建的主题 › 买买买 › 看选苹果还是选小米有感，十年米粉转黑

别的不说，澎湃 OS3 流畅度比 iOS26 那是高多了

4 月 30 日

回复了 KaiWuBOSS 创建的主题 › Local LLM › 我做了个工具让 8GB 显卡跑 30B 模型从 3 tok/s 提到 21 tok/s，记录一下技术发现

@KaiWuBOSS 跑不起来
本地大模型部署器 vv0.3.1 · llama.cpp b8864
by llmbbs.ai · 本地 AI 技术社区

[1/6] Probing hardware...
GPU: NVIDIA GeForce RTX 5060 (SM120, 8151 MB VRAM, 448 GB/s)
RAM: 31 GB DDR4
OS: windows amd64
⚠️ CUDA 13.2 detected — known bug with low-bit quantization
If you see garbled output, downgrade driver to CUDA 13.1
Warning: RTX 50 series with CUDA 13.2 detected
Kaiwu will use CUDA 12.4 binary for stability.

[2/6] Selecting configuration...
Model: Qwen3.6 35B A3B Claude 4.7 Opus Reasoning Distilled (moe, 22B total / 1B active)
Quant: Q22 (13.5 GB)
Mode: moe_partial
Accel: Flash Attention + SWA-Full (hybrid arch)

[3/6] Checking files...
Using bundled iso3 binary: llama-server-cuda.exe
Binary: llama-server-cuda.exe [cached]
Model: Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled.i1-IQ3_XS.gguf [cached]

[4/6] Preflight check...
✓ VRAM sufficient

[5/6] Warmup benchmark...
RTX 50 系首次运行，正在编译 CUDA 内核（约 60s ，仅需一次）...
✓ CUDA 内核编译完成，后续启动将秒开
⚠ JIT 预热失败: exit status 0xc0000135
Probe 1: ctx=128K ... OOM
Probe 2: ctx=64K ... OOM
Probe 3: ctx=32K ... OOM
Probe 4: ctx=16K ... OOM
Probe 5: ctx=8K ... OOM
⚠️ Warmup failed: all ctx probes failed (tried down to 4K)
Using default parameters

[6/6] Starting server...
Waiting for llama-server to be ready (port 11434)...
⚠️ 显存不足，降低上下文至 4K 重试...
Waiting for llama-server to be ready (port 11434)...
Error: failed to start llama-server: 连续 2 次启动失败，即使最小上下文(4K)也无法运行

NVIDIA GeForce RTX 5060: 8151 MB VRAM
模型 Qwen3.6 35B A3B Claude 4.7 Opus Reasoning Distilled: ~13813 MB
KV cache (4K, q4_0): ~80 MB
预估总需: ~14917 MB

差额: 6766 MB

建议:
1. 选择更小的量化 (Q4_K_M 或 Q2_K)
2. 选择更小的模型

Usage:
kaiwu run <model> [flags]

Flags:
--bench Run benchmark after starting
--ctx-size int 手动指定上下文大小（ 0=自动）
--fast Skip warmup, use cached profile
-h, --help help for run
--host string 监听地址（默认 127.0.0.1 ，用 0.0.0.0 开放局域网） (default "127.0.0.1")
--llama-server string 使用自定义 llama-server 二进制（完整路径）
--mode string 模式选择: speed/balanced/context （默认用上次选择）
--reset 清除缓存，重新 warmup 探测最优参数

4 月 22 日

回复了 iknewtoday 创建的主题 › 问与答 › 你们都 vibe 了啥不赚钱的东西

@andyskaura 谢谢哥

4 月 22 日

回复了 iknewtoday 创建的主题 › 问与答 › 你们都 vibe 了啥不赚钱的东西

@andyskaura 等你下周的 win 版本

4 月 22 日

回复了 beimenjun 创建的主题 › 分享创造 › iOS 照片备份 App「🍉西瓜备份」上线，直接免费

谢谢大佬

2025 年 1 月 20 日

回复了 tubanwu 创建的主题 › 宽带症候群 › [求助] 成都最便宜的宽带套餐是什么?

@Vendettar #30 我是去四川联通贴吧里找的，不知道现在还有没有。