有没有熟悉 clickhouse 的？ clickhouse 对于分布式支持的如何？

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

• 请不要在回答技术问题时复制粘贴 AI 生成的内容

要对一个大数据量的 table 进行查询，不会有复杂的查询逻辑，都是简单的 where 、order by 、group by 、sum 、avg 、count 查询，当前数据量接近 500 亿了，在半年内会增加到一万亿。

目前方案是使用 spark ，我知道 clickhouse 很适合 olap 查询场景并且速度很快，但 clickhouse 对于 10000 亿数据量能扛得住吗？或者 clickhouse 也能很好的支持分布式？

对 clickhouse 了解不是很深入，希望大佬指点

clickhouse

分布式

大数据

6 条回复

amoia50

1 天前 via iPhone

分布式运维比较难直接上 Doris 或者 starrocks

zqr10159

1 天前

直接用 doris 就行了

sealinfree

22 小时 58 分钟前 via iPhone

我存日志，有 573 亿，做了 7 个 clickhouse ，都是虚拟机，同一台全闪 7525 ，查询毫秒级，数据库后端这部分自己写的，没用其他中间件，体验很好，升级也没有太多坑，已经线上用了三年多了

chenxytw

18 小时 10 分钟前

Check the `Distributed table engine` of clickhouse.
1 trilliion rows is not big for clickhouse, the key is how your data partition.
what's the columns split your data, what's the minimum & maximum rows of one part, and how many parts of your general query will read.
For `order by` and `group by`, you will also focus on the columns you used,
clickhouse is not good at non primary key, you can check what called `mark` in clickhouse.
If your query sample from or sort in many marks, it will cause very low performance.

red13

15 小时 36 分钟前

@sealinfree 可以告知服务器的配置吗？

sealinfree

6 小时 51 分钟前

@red13 单台虚拟机 8 核心 24G 内存，服务器是 PowerEdge R7525 ，处理器:AMD EPYC 7H12 ，服务器负载常年 10%左右，服务器上还有其他业务一共 20 个左右虚拟机，clickhouse 只有 7 个