干货|以 100GB SSB 性能测试为例,通过 ByteHouse 云数仓开启你的数据分析之路

技术

picture.image

从传统数仓、湖仓一体再到云数仓,技术快速迭代,云数仓已经成为数字化基础设施中的关键“底座”。 如何才能丝滑使用一款云数仓产品,本篇文章用五个步骤教你搞定!

picture.image

文 | Aurora

来自火山引擎ByteHouse团队

picture.image

近年来,随着数据“爆炸式”的增长,越来越多的数据被产生、收集和存储。而挖掘海量数据中的真实价值,从其中提取商机并洞见未来,则成了现代企业和组织不可忽视的命题。

随着数据量级和复杂度的增大,数据分析处理的技术架构也在不断演进。在面对海量数据分析时,传统 OLAP 技术架构中的痛点变得越来越明显,如扩容缩容耗时长,导致资源利用率偏低,成本居高不下;以及运维配置复杂,需要专业的技术人员介入等。

为了解决这类问题,云数仓的概念应运而生。和传统数仓架构不同的是,云原生数仓借助于云平台的基础资源, 实现了资源的动态扩缩容,并最大化利用资源, 从而达到 Pay as you go 按实际用量付费的模式。

ByteHouse 作为云原生的数据平台,从架构层面入手,通过存储和计算分离的云原生架构完美适配云上基础设施。在字节跳动内部, ByteHouse 已经支持 80% 的分析应用场景,包括用户增长业务、广告、A/B 测试等 。除了极致的分析性能之外,ByteHouse 开箱即用, 按实际使用付费的特性也极大地降低了企业和个人的上手门槛, 能够在短短数分钟内体验到数据分析的魅力。

Talk is cheap, 接下来就让我们通过一个实战案例来体验下 ByteHouse 云数仓的强大功能。

picture.image

本章节通过使用 ByteHouse 云数仓进行 SSB 基准测试,在带领读者了解产品性能的同时,也一并熟悉产品中各个模块的功能,开启你的数据分析之路,通过分析海量数据,加速数据洞察。

ByteHouse 的架构总览如下:

picture.image

/ SSB 基准测试 /

SSB(Star Schema Benchmark)是由麻省州立大学波士顿校区的研究员定义的基于现实商业应用的数据模型。SSB 是在 TPC-H 标准的基础上改进而成,主要将 TPC-H 中的雪花模型改成了更为通用的的星型模型,将基准查询从复杂的 Ad-hoc 查询改成了结构更加固定的 OLAP 查询,从而主要用于模拟测试 OLAP 引擎和轻量数仓场景下的查询性能。由于 SSB 基准测试较为中立,并贴近现实的商业场景,因此在学界及工业界有广泛的应用。

SSB 基准测试中对应的表结构如下所示,可以看到 SSB 主要采用星型模型,其中包含了 1 个事实表 lineorder 和 4 个维度表 customer, part, dwdate 以及 supplier,每张维度表通过 Primary Key 和事实表进行关联。测试通过执行 13 条 SQL 进行查询,包含了多表关联,group by,复杂条件等多种组合。更多详细信息请参考 SSB 文献。

picture.image

/ 步骤一:官网注册并开通 ByteHouse /

访问 ByteHouse 云数仓火山引擎官网,注册火山引擎账户,完成实名认证后,即可登录到产品控制台。开通产品进行测试,目前 ByteHouse 支持包年包月和按量付费两种模式的实例,便于您根据业务需求进行选择。

picture.image

/ 步骤二:创建计算组 /

登录到控制台后,可以看到数据库表管理、数据加载、SQL 工作表、计算组、查询历史和角色管理等几大模块。分别具有如下作用:

  • 数据库表管理:用于创建和管理数据库、数据表以及视图等数据对象

  • 数据加载:用于从不同的离线和实时数据源如对象存储、Kafka 等地写入数据

  • SQL 工作表:在界面上编辑、管理并运行 SQL 查询

  • 计算组:创建和管理虚拟的计算资源,用于执行数据查询等操作

  • 查询历史:用于查看 SQL 的历史执行记录、状态和查询详情等

picture.image

为了方便进行后续的建库建表和查询等操作,首先在 ByteHouse 控制台创建型号为 L 的计算组,如下图所示:

picture.image

计算组是 Bytehouse 中的计算资源集群,可按需进行横向扩展。计算组提供所需的资源如 CPU、内存及临时存储等,用于执行数据查询 DQL、DML 等操作。ByteHouse 计算组能够实现弹性扩缩容,读写分离、存算分离等,并且能对资源进行细粒度的权限控制。

/ 步骤三:创建数据库表 /

在控制台页面中创建名为 ssb_100 的数据库。

picture.image

创建完毕后,进入到 SQL 工作表模块,通过如下建表语句建立四个数据表(事实表),并保存对应的 SQL 语句。


        
  `CREATE TABLE ssb_100.customer`
  `(`
  `C_CUSTKEY UInt32,`
  `C_NAME String,`
  `C_ADDRESS String,`
  `C_CITY LowCardinality(String),`
  `C_NATION LowCardinality(String),`
  `C_REGION LowCardinality(String),`
  `C_PHONE String,`
  `C_MKTSEGMENT LowCardinality(String),`
  `C_PLACEHOLDER Nullable(String)`
  `)`
  `ENGINE = CnchMergeTree ORDER BY (C_CUSTKEY);`
 
  `CREATE TABLE ssb_100.lineorder`
  `(`
  `LO_ORDERKEY UInt32,`
  `LO_LINENUMBER UInt8,`
  `LO_CUSTKEY UInt32,`
  `LO_PARTKEY UInt32,`
  `LO_SUPPKEY UInt32,`
  `LO_ORDERDATE Date,`
  `LO_ORDERPRIORITY LowCardinality(String),`
  `LO_SHIPPRIORITY UInt8,`
  `LO_QUANTITY UInt8,`
  `LO_EXTENDEDPRICE UInt32,`
  `LO_ORDTOTALPRICE UInt32,`
  `LO_DISCOUNT UInt8,`
  `LO_REVENUE UInt32,`
  `LO_SUPPLYCOST UInt32,`
  `LO_TAX UInt8,`
  `LO_COMMITDATE Date,`
  `LO_SHIPMODE LowCardinality(String),`
  `LO_PLACEHOLDER Nullable(String)`
  `)`
  `ENGINE = CnchMergeTree PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY);`
 
  `CREATE TABLE ssb_100.part`
  `(`
  `P_PARTKEY UInt32,`
  `P_NAME String,`
  `P_MFGR LowCardinality(String),`
  `P_CATEGORY LowCardinality(String),`
  `P_BRAND LowCardinality(String),`
  `P_COLOR LowCardinality(String),`
  `P_TYPE LowCardinality(String),`
  `P_SIZE UInt8,`
  `P_CONTAINER LowCardinality(String),`
  `P_PLACEHOLDER Nullable(String)`
  `)`
  `ENGINE = CnchMergeTree ORDER BY P_PARTKEY;`
 
  `CREATE TABLE ssb_100.supplier`
  `(`
  `S_SUPPKEY UInt32,`
  `S_NAME String,`
  `S_ADDRESS String,`
  `S_CITY LowCardinality(String),`
  `S_NATION LowCardinality(String),`
  `S_REGION LowCardinality(String),`
  `S_PHONE String,`
  `S_PLACEHOLDER Nullable(String)`
  `)`
  `ENGINE = CnchMergeTree ORDER BY S_SUPPKEY;`
 
  `CREATE TABLE ssb_100.dwdate`
  `(`
  `D_DATEKEY UInt32,`
  `D_DATE String,`
  `D_DAYOFWEEK String, -- defined in Section 2.6 as Size 8, but Wednesday is 9 letters`
  `D_MONTH String,`
  `D_YEAR UInt32,`
  `D_YEARMONTHNUM UInt32,`
  `D_YEARMONTH String,`
  `D_DAYNUMINWEEK UInt32,`
  `D_DAYNUMINMONTH UInt32,`
  `D_DAYNUMINYEAR UInt32,`
  `D_MONTHNUMINYEAR UInt32,`
  `D_WEEKNUMINYEAR UInt32,`
  `D_SELLINGSEASON String,`
  `D_LASTDAYINWEEKFL UInt32,`
  `D_LASTDAYINMONTHFL UInt32,`
  `D_HOLIDAYFL UInt32,`
  `D_WEEKDAYFL UInt32,`
  `S_PLACEHOLDER Nullable(String)`
  `)`
  `ENGINE=CnchMergeTree() ORDER BY (D_DATEKEY);`
 
      

SQL 执行完毕后,在控制台左侧对应的数据对象页面会展示出创建完成的五个工作表,分别为 customer,dwdate,lineorder以及part 和 supplier。

picture.image

/ 步骤四:从对象存储中导入 SSB 数据 /

通过预先生成 SSB_100 GB 的数据集并存储在对象存储(如 AWS S3 或者 火山引擎 TOS),我们可以方便且快速的将数据导入到 ByteHouse 中进行分析。本次实践中通过配置 火山引擎 TOS 的数据源对数据进行导入。

首先在数据加载模块,新建对象存储数据源,并配置对应的秘钥连接火山引擎对象存储。

picture.image

picture.image

连接新的数据源后,选择 bytehouse-shared-dataset 的储存桶和ssb_100/lineorder.csv 相应的路径

picture.image

picture.image

选择之前建的数据库ssb_100和对应标表lineorder,然后按创建。重复步骤为其他四个工作表数据加载。

picture.image

数据源中存储的数据条数如下所示。用于导入完成后,对数据表的行数进行统计,进行准确性校验。

| Table | Bucket Path 路径 | Rows | Size 大小 | | lineorder | ssb_100/lineorder.csv | 600,038,145 | 246.30 GB | | customer | ssb_100/customer.csv | 3,000,000 | | part | ssb_100/part.csv | 1,400,000 | | supplier | ssb_100/supplier.csv | 200,000 | | dwdate | ssb_100/dwdate.csv | 2,556 |

创建导入任务完成后,点击“开始”启动导入任务,任务启动后会在几秒钟内分配资源并初始化导入任务,并在导入过程中展示预估的时间和导入进度。在导入任务的执行详情中,可以查看导入状态、导入详细日志、配置信息等。

picture.image

picture.image

/ 步骤五:数据处理及分析 /

1. 原始查询测试

通过执行 SSB 的 13 条查询语句,对于多表关联和排序等场景进行性能测试。查询语句如下所示:


        
  `-- pre-warm`
  `select * from ssb_100.customer order by C_CUSTKEY desc limit 100;`
  `select * from ssb_100.dwdate order by D_DATEKEY desc limit 100;`
  `select * from ssb_100.lineorder order by LO_ORDERKEY desc limit 100;`
  `select * from ssb_100.part order by P_PARTKEY desc limit 100;`
  `select * from ssb_100.supplier order by S_SUPPKEY desc limit 100;`
  `select * from ssb_100.lineorder_flat order by LO_ORDERKEY desc limit 100;`
 
  `-- Q1.1`
  `select sum(LO_EXTENDEDPRICE*LO_DISCOUNT) as revenue`
  `from ssb_100.lineorder`
  `where toYear(LO_ORDERDATE) = 1993`
  `and LO_DISCOUNT between 1 and 3`
  `and LO_QUANTITY < 25;`
 
  `-- Q1.2`
  `select sum(LO_EXTENDEDPRICE*LO_DISCOUNT) as revenue`
  `from ssb_100.lineorder`
  `where toYYYYMM(LO_ORDERDATE) = 199401`
  `and LO_DISCOUNT between 4 and 6`
  `and LO_QUANTITY between 26 and 35;`
 
  `-- Q1.3`
  `select sum(LO_EXTENDEDPRICE*LO_DISCOUNT) as revenue`
  `from ssb_100.lineorder`
  `where toISOWeek(LO_ORDERDATE) = 6`
  `and toYear(LO_ORDERDATE)= 1994`
  `and LO_DISCOUNT between 5 and 7`
  `and LO_QUANTITY between 26 and 35;`
 
  `-- Q2.1`
  `select sum(LO_REVENUE), toYear(LO_ORDERDATE) AS d_year, P_BRAND`
  `from ssb_100.lineorder, ssb_100.part, ssb_100.supplier`
  `where LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY`
  `and P_CATEGORY = 'MFGR#53' and S_REGION = 'AMERICA'`
  `GROUP BY d_year, P_BRAND;`
 
  `-- Q2.2`
  `SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY`
  `and P_BRAND >= 'MFGR#2221' and P_BRAND <= 'MFGR#2228' and S_REGION = 'ASIA'`
  `GROUP BY year, P_BRAND`
  `ORDER BY year, P_BRAND;`
 
  `-- Q2.3`
  `SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY` 
  `and P_BRAND = 'MFGR#2239'and S_REGION = 'EUROPE'`
  `GROUP BY year, P_BRAND`
  `ORDER BY year, P_BRAND;`
 
  `-- Q3.1`
  `SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY`
  `and C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997`
  `GROUP BY C_NATION, S_NATION, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- Q3.2` 
  `SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY`
  `and C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997`
  `GROUP BY C_CITY, S_CITY, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- Q3.3`
  `SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY`
  `and (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997`
  `GROUP BY C_CITY, S_CITY, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- Q3.4`
  `SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY`
  `and (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712`
  `GROUP BY C_CITY, S_CITY, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- Q4.1`
  `SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY`
  `and C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')`
  `GROUP BY year, C_NATION`
  `ORDER BY year ASC, C_NATION ASC;`
 
  `-- Q4.2`
  `SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier, ssb_100.customer`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY and LO_CUSTKEY = C_CUSTKEY`
  `and C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')`
  `GROUP BY year, S_NATION, P_CATEGORY`
  `ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;`
 
  `-- Q4.3`
  `SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit`
  `FROM ssb_100.lineorder, ssb_100.part, ssb_100.supplier`
  `WHERE LO_PARTKEY = P_PARTKEY and LO_SUPPKEY = S_SUPPKEY`
  `and S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'`
  `GROUP BY year, S_CITY, P_BRAND`
  `ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;`
 
      

2. 打平表测试

为了方便对 SSB 数据集进行测试,我们可以通过改写 SSB,将星型模型打平转换为大宽表进行分析。

注:为了确保打平表的执行,需要配置参数 SET max_memory_usage = 20000000000; 此外需要在 ByteHouse 控制台中配置查询超时为 3600s (我的账户 > 查询配置 > 查询超时),避免执行超时导致的失败。


        
  `SET max_memory_usage = 20000000000;`
  `SET send_timeout = 3600;`
  `SET receive_timeout = 3600;`
 
  `CREATE TABLE IF NOT EXISTS ssb_100.lineorder_flat`
  `engine = CnchMergeTree`
  `partition by toYear(LO_ORDERDATE)`
  `order by (LO_ORDERDATE, LO_ORDERKEY) as`
  `select`
  `L.LO_ORDERKEY as LO_ORDERKEY,`
  `L.LO_LINENUMBER as LO_LINENUMBER,`
  `L.LO_CUSTKEY as LO_CUSTKEY,`
  `L.LO_PARTKEY as LO_PARTKEY,`
  `L.LO_SUPPKEY as LO_SUPPKEY,`
  `L.LO_ORDERDATE as LO_ORDERDATE,`
  `L.LO_ORDERPRIORITY as LO_ORDERPRIORITY,`
  `L.LO_SHIPPRIORITY as LO_SHIPPRIORITY,`
  `L.LO_QUANTITY as LO_QUANTITY,`
  `L.LO_EXTENDEDPRICE as LO_EXTENDEDPRICE,`
  `L.LO_ORDTOTALPRICE as LO_ORDTOTALPRICE,`
  `L.LO_DISCOUNT as LO_DISCOUNT,`
  `L.LO_REVENUE as LO_REVENUE,`
  `L.LO_SUPPLYCOST as LO_SUPPLYCOST,`
  `L.LO_TAX as LO_TAX,`
  `L.LO_COMMITDATE as LO_COMMITDATE,`
  `L.LO_SHIPMODE as LO_SHIPMODE,`
  `C.C_NAME as C_NAME,`
  `C.C_ADDRESS as C_ADDRESS,`
  `C.C_CITY as C_CITY,`
  `C.C_NATION as C_NATION,`
  `C.C_REGION as C_REGION,`
  `C.C_PHONE as C_PHONE,`
  `C.C_MKTSEGMENT as C_MKTSEGMENT,`
  `S.S_NAME as S_NAME,`
  `S.S_ADDRESS as S_ADDRESS,`
  `S.S_CITY as S_CITY,`
  `S.S_NATION as S_NATION,`
  `S.S_REGION as S_REGION,`
  `S.S_PHONE as S_PHONE,`
  `P.P_NAME as P_NAME,`
  `P.P_MFGR as P_MFGR,`
  `P.P_CATEGORY as P_CATEGORY,`
  `P.P_BRAND as P_BRAND,`
  `P.P_COLOR as P_COLOR,`
  `P.P_TYPE as P_TYPE,`
  `P.P_SIZE as P_SIZE,`
  `P.P_CONTAINER as P_CONTAINER`
  `from ssb_100.lineorder as L`
  `inner join ssb_100.customer as C on C.C_CUSTKEY = L.LO_CUSTKEY`
  `inner join ssb_100.supplier as S on S.S_SUPPKEY = L.LO_SUPPKEY`
  `inner join ssb_100.part as P on P.P_PARTKEY = L.LO_PARTKEY;`
 
      

建表完成后,通过执行查询语句进行 SSB 性能测试,如下所示:


        
  `-- F1.1`
  `SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE toYear(LO_ORDERDATE) = 1993`
  `AND LO_DISCOUNT BETWEEN 1 AND 3` 
  `AND LO_QUANTITY < 25;`
 
  `-- F1.2`
  `SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE toYYYYMM(LO_ORDERDATE) = 199401`
  `AND LO_DISCOUNT BETWEEN 4 AND 6`
  `AND LO_QUANTITY BETWEEN 26 AND 35;`
 
  `-- F1.3`
  `SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE toISOWeek(LO_ORDERDATE) = 6`
  `AND toYear(LO_ORDERDATE) = 1994`
  `AND LO_DISCOUNT BETWEEN 5 AND 7`
  `AND LO_QUANTITY BETWEEN 26 AND 35;`
 
  `-- F2.1`
  `SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND`
  `FROM ssb_100.lineorder_flat`
  `WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'`
  `GROUP BY year, P_BRAND`
  `ORDER BY year, P_BRAND;`
 
  `-- F2.2`
  `SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND`
  `FROM ssb_100.lineorder_flat`
  `WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'`
  `GROUP BY year, P_BRAND`
  `ORDER BY year, P_BRAND;`
 
  `-- F2.3`
  `SELECT sum(LO_REVENUE), toYear(LO_ORDERDATE) AS year, P_BRAND`
  `FROM ssb_100.lineorder_flat`
  `WHERE P_BRAND = 'MFGR#2239' AND S_REGION = 'EUROPE'`
  `GROUP BY year, P_BRAND`
  `ORDER BY year, P_BRAND;`
 
  `-- F3.1`
  `SELECT C_NATION, S_NATION, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997`
  `GROUP BY C_NATION, S_NATION, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- F3.2`
  `SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997`
  `GROUP BY C_CITY, S_CITY, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- F3.3`
  `SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND year >= 1992 AND year <= 1997`
  `GROUP BY C_CITY, S_CITY, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- F3.4`
  `SELECT C_CITY, S_CITY, toYear(LO_ORDERDATE) AS year, sum(LO_REVENUE) AS revenue`
  `FROM ssb_100.lineorder_flat`
  `WHERE (C_CITY = 'UNITED KI1' OR C_CITY = 'UNITED KI5') AND (S_CITY = 'UNITED KI1' OR S_CITY = 'UNITED KI5') AND toYYYYMM(LO_ORDERDATE) = 199712`
  `GROUP BY C_CITY, S_CITY, year`
  `ORDER BY year ASC, revenue DESC;`
 
  `-- F4.1`
  `SELECT toYear(LO_ORDERDATE) AS year, C_NATION, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit`
  `FROM ssb_100.lineorder_flat`
  `WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')`
  `GROUP BY year, C_NATION`
  `ORDER BY year ASC, C_NATION ASC;`
 
  `-- F4.2`
  `SELECT toYear(LO_ORDERDATE) AS year, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit`
  `FROM ssb_100.lineorder_flat`
  `WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')`
  `GROUP BY year, S_NATION, P_CATEGORY`
  `ORDER BY year ASC, S_NATION ASC, P_CATEGORY ASC;`
 
  `-- F4.3`
  `SELECT toYear(LO_ORDERDATE) AS year, S_CITY, P_BRAND, sum(LO_REVENUE - LO_SUPPLYCOST) AS profit`
  `FROM ssb_100.lineorder_flat`
  `WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'`
  `GROUP BY year, S_CITY, P_BRAND`
  `ORDER BY year ASC, S_CITY ASC, P_BRAND ASC;`
 
      

picture.image

执行完毕后,统计查询结果如下所示:

注:查询结果因配置参数和资源配置的不同,耗时也有差异,欢迎联系 ByteHouse 进行查询优化。

| Query | Duration(ms) | Query(flat) | Duration(ms) | | Q1.1 | 280 | F1.1 | 710 | | Q1.2 | 130 | F1.2 | 170 | | Q1.3 | 130 | F1.3 | 200 | | Q2.1 | 670 | F2.1 | 240 | | Q2.2 | 740 | F2.2 | 230 | | Q2.3 | 520 | F2.3 | 210 | | Q3.1 | 620 | F3.1 | 290 | | Q3.2 | 600 | F3.2 | 270 | | Q3.3 | 590 | F3.3 | 310 | | Q3.4 | 320 | F3.4 | 140 | | Q4.1 | 800 | F4.1 | 270 | | Q4.2 | 570 | F4.2 | 240 | | Q4.3 | 520 | F4.3 | 270 |

picture.image

查询完成后,在 ByteHouse 计算组详情页面可以查看工作负载,包括总查询条数和CPU/Mem 利用率等,从而确认计算资源的使用情况。

picture.image

根据本次 压测进行预估,消耗计算和存储资源如下表所示,由于 ByteHouse 云数仓版本按使用量计费的能力,在空闲时支持自动关闭计算组并不收取闲置费用,从而能够极大的节省资源。测试完成后,预 估的总体消耗约为 31.23 元。

| | 配置 | 用量 | 预估月消耗 | | 计算资源 | L size: 32核128GB,1 节点 | 1 小时 | 11.23 元 | | 存储资源 | 208GB,存储一个月 | 21 元 |

/ ByteHouse大促进行中 /

picture.image

产品介绍

火山引擎ByteHouse

统一的大数据分析平台。目前提供企业版和云数仓两种版本,企业版是基于开源的企业级分析型数据库,支持用户交互式分析PB级别数据,通过多种自研表引擎,灵活支持各类数据分析和应用;云数仓版作为云原生的数据分析平台,实现统一的离线和实时数据分析,并通过弹性扩展的计算层和分布式存储层,有效降低 企业大数据分析。后台回复数字“6”了解产品

picture.image

picture.image

picture.image

picture.image

picture.image

17
0
0
0
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论