-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Search before asking
- I searched the issues and found no similar issues.
Linkis Component
linkis-engineconn-plugins
What happened
English:
Linkis currently lacks native support for ClickHouse, the industry-leading ultra-high performance OLAP engine. ClickHouse is widely adopted by top-tier companies (ByteDance, Tencent, Alibaba, Meituan, JD.com) for real-time analytics, user behavior analysis, and log analysis due to its exceptional query performance (10-100x faster than traditional MPP databases).
Market Demand:
- Performance Benchmark: ClickHouse is recognized as the OLAP performance standard, with single-table queries 10-100x faster than traditional MPP databases
- Real-time Data Ingestion: Supports millions of rows per second write throughput
- Columnar Storage: 10:1 compression ratio, significantly reducing storage costs
- High Market Penetration: Widely used in finance (BOC, CMB, Ping An), telecom (China Mobile, China Unicom), and internet sectors
- Active Community: 34k+ GitHub stars, one of the fastest-growing OLAP databases
Strategic Value:
ClickHouse complements existing engines (Doris, Presto/Trino) by forming a golden triangle of performance-concurrency-flexibility:
- ClickHouse: Best for single-table wide-table queries and extreme performance
- Doris: Best for multi-dimensional analysis and BI reports
- Presto/Trino: Best for data lake queries and federated queries
中文:
Linkis目前缺乏对ClickHouse的原生支持,ClickHouse是业界领先的超高性能OLAP引擎。由于其卓越的查询性能(比传统MPP数据库快10-100倍),ClickHouse被头部公司(字节跳动、腾讯、阿里巴巴、美团、京东)广泛采用,用于实时分析、用户行为分析和日志分析。
市场需求:
- 性能标杆: ClickHouse是公认的OLAP性能标准,单表查询性能比传统MPP数据库快10-100倍
- 实时数据摄入: 支持每秒数百万行数据写入
- 列式存储: 10:1压缩比,显著降低存储成本
- 市场渗透率高: 在金融(中国银行、招商银行、平安集团)、电信(中国移动、中国联通)和互联网行业广泛使用
- 社区活跃: GitHub 34k+ stars,增长最快的OLAP数据库之一
战略价值:
ClickHouse与现有引擎(Doris、Presto/Trino)互补,形成性能-并发-灵活性的黄金三角:
- ClickHouse: 适合单表大宽表查询和极致性能
- Doris: 适合多维分析和BI报表
- Presto/Trino: 适合数据湖查询和联邦查询
What you expected to happen
English:
Linkis should provide a ClickHouse engine plugin with the following capabilities:
-
SQL Query Support:
- Standard SQL syntax support
- Distributed table and local table query support
- MergeTree family table engine support
- Materialized view support
-
Data Operations:
- INSERT operations for batch data loading
- SELECT queries with complex aggregations
- JOIN operations across tables
- Support for ClickHouse-specific functions
-
Connection Management:
- Support for JDBC and HTTP protocols
- Connection pooling and reuse
- Support for distributed clusters
- Authentication and authorization
-
Performance Optimization:
- Query result streaming to avoid OOM
- Support for sampling queries
- Query timeout and cancellation
- Resource usage monitoring
-
Integration with Linkis:
- Unified task submission interface
- Resource management integration
- Permission control integration
- Metadata management support
中文:
Linkis应该提供ClickHouse引擎插件,具备以下能力:
-
SQL查询支持:
- 标准SQL语法支持
- 分布式表和本地表查询支持
- MergeTree系列表引擎支持
- 物化视图支持
-
数据操作:
- INSERT操作用于批量数据加载
- 复杂聚合的SELECT查询
- 跨表JOIN操作
- ClickHouse特定函数支持
-
连接管理:
- 支持JDBC和HTTP协议
- 连接池和复用
- 支持分布式集群
- 认证和授权
-
性能优化:
- 查询结果流式处理避免OOM
- 支持采样查询
- 查询超时和取消
- 资源使用监控
-
与Linkis集成:
- 统一的任务提交接口
- 资源管理集成
- 权限控制集成
- 元数据管理支持
How to reproduce
English:
Current situation:
- Users need to manually set up ClickHouse JDBC connections
- No dedicated engine plugin for ClickHouse operations
- Cannot leverage Linkis's unified task submission and resource management
- Limited support for ClickHouse-specific features
Use case example:
-- User wants to query ClickHouse for real-time analytics
SELECT
toDate(event_time) as date,
user_id,
count() as event_count,
uniq(session_id) as session_count
FROM events_distributed
WHERE event_time >= today() - 7
GROUP BY date, user_id
ORDER BY date DESC, event_count DESC
LIMIT 1000;
-- Advanced features like sampling not supported
SELECT count() FROM large_table SAMPLE 0.1;