V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
catazshadow

为什么你该停止使用 Ollama

  •  1
     
  •   catazshadow · 1 天前 · 3014 次点击
    原文:
    https://sleepingrobots.com/dreams/stop-using-ollama/

    “Ollama wrapped that work in a nice CLI, raised VC money on the back of it, spent over a year refusing to credit it, forked it badly, shipped a closed-source app alongside it, and then pivoted the whole thing toward cloud services. At every decision point where they could have been good open-source citizens, they chose the path that made them look more self-sufficient to investors.”

    总之就是开源小偷,还尝试锁死用户

    原文建议用这些:

    llama.cpp is the engine. It has an OpenAI-compatible API server (llama-server), a built-in web UI, full control over context windows and sampling parameters, and consistently better throughput than Ollama. In February 2026, Gerganov’s ggml.ai joined Hugging Face to ensure the long-term sustainability of the project. It’s truly community-driven, MIT-licensed, and under active development with 450+ contributors.

    llama-swap handles multi-model orchestration, loading, unloading, and hot-swapping models on demand behind a single API endpoint. Pair it with LiteLLM and you get a unified OpenAI-compatible proxy that routes across multiple backends with proper model aliasing.

    LM Studio gives you a GUI if that’s what you want. It uses llama.cpp under the hood, exposes all the knobs, and supports any GGUF model without lock-in. Jan is another open-source desktop app with a clean chat interface and local-first design. Msty offers a polished GUI with multi-model support and built-in RAG. koboldcpp is another option with a web UI and extensive configuration options.

    Red Hat’s ramalama is worth a look too, a container-native model runner that explicitly credits its upstream dependencies front and center. Exactly what Ollama should have done from the start.
    第 1 条附言  ·  1 天前
    还有 ollama 的性能更差
    11 条回复    2026-04-19 12:26:34 +08:00
    anbabubabiluya
        1
    anbabubabiluya  
       1 天前 via Android
    有大佬能推荐一个部署平台吗?我也觉得 ollama 太慢了,显卡是 5060ti 16g ,最好能直接在 Windows 跑
    tool2dx
        2
    tool2dx  
       1 天前   ❤️ 1
    @anbabubabiluya ollama 不慢的, 我显卡比你还差, 只有 12G 显存, 但是电脑是双显卡, 加起来就有 24G 显存. 运行 ollama 上的 qwen3.6 35b-q4 版本, 如果优化后没爆显存, 速度满速飞起. 默认是爆显存 8%, 速度降为 1/6, 超慢.
    catazshadow
        3
    catazshadow  
    OP
       1 天前
    @anbabubabiluya lm studio 似乎可以
    ebushicao
        4
    ebushicao  
       1 天前
    我前段时间从 ollama 换到 lm studio 了,真的好很多,相比之下 ollama 真的是个一般的玩具
    r6cb
        5
    r6cb  
       1 天前
    @anbabubabiluya #1 试试看 wsl 安装 vllm
    woctordho
        6
    woctordho  
       1 天前 via Android
    @anbabubabiluya 用 llama.cpp 就行了
    metalvest
        7
    metalvest  
       1 天前 via Android
    对普通用户来说这些都可以忽略,好用就行,看看豆包为什么用户这么多就知道了
    rammiah
        8
    rammiah  
       1 天前
    lm-studio 支持服务器运行吗? ollama 主要是拉模型方便,modelscope ,hf 都能用
    01802
        9
    01802  
       1 天前 via Android
    最近不用 ollama ,为了方便,koboldcpp 都行
    catazshadow
        10
    catazshadow  
    OP
       1 天前 via Android
    @rammiah 服务器用 llama-swap 调 llama.cpp 就行了
    julyclyde
        11
    julyclyde  
       1 天前
    @rammiah 我总感觉 ollama 和 docker 有点像
    关于   ·   帮助文档   ·   自助推广系统   ·   博客   ·   API   ·   FAQ   ·   Solana   ·   3781 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 45ms · UTC 05:05 · PVG 13:05 · LAX 22:05 · JFK 01:05
    ♥ Do have faith in what you're doing.