作者:Haaahei
原文来源:
为确保DM可以在线上稳定运行,现计划对其高可用机制进行演练,主要包括如下事项:
| 事项 | 验证点 | 步骤 | 结论 | | ------------ | ----------------------------------------------------------------------------------------------------- | -- | -- | | dm-worker ha | 验证dm-worker宕机- 同步任务是否会转移
模拟dm-worker宕机
``` |
date; kill -9 pid; mv <deploy dir> <deploy dir>-1 # 强制kill dm-worker pid,并将部署目录改名防止自启动
观察任务切换情况
记录相关数据:切换耗时,任务状态,延时情况
结论:
同步任务是否会转移
``` |
[2021/08/17 13:28:04.712 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] ... [2021/08/17 13:28:51.576 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] [2021/08/17 13:28:54.876 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] [2021/08/17 13:28:57.913 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] [2021/08/17 13:28:58.159 +08:00] [INFO] [:216] ["receive dm-worker keep alive event"] [operation=DELETE] [kv=/dm-worker/a/646d2d3137322e31372e3230312e3131352d38323632] [2021/08/17 13:28:58.163 +08:00] [INFO] [:1506] ["receive worker status change event"] [component=scheduler] [delete=true] [event="{"worker-name":"dm-172.17.201.115-8262","join-time":"0001-01-01T00:00:00Z"}"] [2021/08/17 13:28:58.165 +08:00] [INFO] [:1662] ["unbound the worker for source"] [component=scheduler] [bound="{"source":"ds-mysql_report","worker":"dm-172.17.201.115-8262"}"] [event="{"worker-name":"dm-172.17.201.115-8262","join-time":"0001-01-01T00:00:00Z"}"] [2021/08/17 13:28:58.165 +08:00] [INFO] [:1838] ["found free worker when source bound"] [component=scheduler] [worker=dm-172.18.78.254-8265] [source=ds-mysql_report] [2021/08/17 13:28:58.168 +08:00] [INFO] [:1876] ["bound the source to worker"] [component=scheduler] [bound="{"source":"ds-mysql_report","worker":"dm-172.18.78.254-8265"}"]
大约60s左右,新的dm-worker成功接管同步任务,通过query-status查看同步状态正常
<!---->
同步任务情况
``` |
[2021/08/17 13:28:58.168 +08:00] [INFO] [:581] ["receive source bound"] [bound="{"source":"ds-mysql_report","worker":"dm-172.18.78.254-8265"}"] ["is deleted"=false] [2021/08/17 13:28:58.170 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 13:28:58.170 +08:00] [INFO] [:836] ["will start a new worker"] [sourceID=ds-mysql_report] [2021/08/17 13:28:58.170 +08:00] [INFO] [:120] [initialized] [component="worker controller"] [cfg="{"enable-gtid":true,"auto-fix-gtid":false,"relay-dir":"relay-dir","meta-dir":"","flavor":"mysql","charset":"","enable-relay":false,"relay-binlog-name":"","relay-binlog-gtid":"","source-id":"ds-mysql_report","from":{"host":"172.16.150.53","port":15381,"user":"dm_sync","max-allowed-packet":null,"session":{"time_zone":"+00:00"},"security":null},"purge":{"interval":3600,"expires":0,"remain-space":15},"checker":{"check-enable":true,"backoff-rollback":{"Duration":"5m0s"},"backoff-max":{"Duration":"5m0s"}},"server-id":429548349,"case-sensitive":false,"filters":null}"] [2021/08/17 13:28:58.170 +08:00] [INFO] [:135] ["start running"] [component="worker controller"] [2021/08/17 13:28:58.270 +08:00] [INFO] [:310] ["enter EnableHandleSubtasks"] [component="worker controller"] [2021/08/17 13:28:58.272 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 13:28:58.272 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 13:28:58.273 +08:00] [INFO] [:326] ["starting to handle mysql source"] [component="worker controller"] [sourceCfg="{"enable-gtid":true,"auto-fix-gtid":false,"relay-dir":"relay-dir","meta-dir":"","flavor":"mysql","charset":"","enable-relay":false,"relay-binlog-name":"","relay-binlog-gtid":"","source-id":"ds-mysql_report","from":{"host":"172.16.150.53","port":15381,"user":"dm_sync","max-allowed-packet":null,"session":{"time_zone":"+00:00"},"security":null},"purge":{"interval":3600,"expires":0,"remain-space":15},"checker":{"check-enable":true,"backoff-rollback":{"Duration":"5m0s"},"backoff-max":{"Duration":"5m0s"}},"server-id":429548349,"case-sensitive":false,"filters":null}"] [subTasks="{"dm-mysql_report":{"is-sharding":false,"shard-mode":"","online-ddl-scheme":"gh-ost","case-sensitive":false,"name":"dm-mysql_report","mode":"incremental","ignore-checking-items":["dump_privilege"],"source-id":"ds-mysql_report","server-id":429548349,"flavor":"mysql","meta-schema":"dm_meta","heartbeat-update-interval":1,"heartbeat-report-interval":10,"enable-heartbeat":false,"meta":{"BinLogName":"","BinLogPos":0,"BinLogGTID":"34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-168290280,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207"},"timezone":"","relay-dir":"relay-dir","use-relay":false,"from":{"host":"172.16.150.53","port":15381,"user":"dm_sync","max-allowed-packet":null,"session":{"time_zone":"+00:00"},"security":null},"to":{"host":"172.21.35.233","port":15381,"user":"dm_load","max-allowed-packet":null,"session":{"tidb_txn_mode":"optimistic","time_zone":"+00:00"},"security":null},"route-rules":[{"schema-pattern":"reverse_flow","table-pattern":"","target-schema":"reverse_center","target-table":""}],"filter-rules":[],"mapping-rule":[],"black-white-list":null,"block-allow-list":{"do-tables":[{"db-name":"reverse_flow","tbl-name":"rc_reverse_record_integration"}],"do-dbs":["reverse_flow"],"ignore-tables":null,"ignore-dbs":null},"mydumper-path":"./bin/mydumper","threads":1,"chunk-filesize":"64","statement-size":0,"rows":1000,"where":"","skip-tz-utc":true,"extra-args":"--consistency none","pool-size":8,"dir":"./dm-mysql_report.dm-mysql_report","meta-file":"","worker-count":128,"batch":100,"queue-size":1024,"checkpoint-flush-interval":30,"max-retry":0,"auto-fix-gtid":false,"enable-gtid":true,"disable-detect":false,"safe-mode":false,"enable-ansi-quotes":false,"log-level":"","log-file":"","log-format":"","log-rotate":"","pprof-addr":"","status-addr":"","config-file":"","clean-dump-file":false,"ansi-quotes":false}}"] [2021/08/17 13:28:58.273 +08:00] [INFO] [:333] ["start to create subtask"] [component="worker controller"] [sourceID=ds-mysql_report] [task=dm-mysql_report] [2021/08/17 13:28:58.273 +08:00] [INFO] [:426] ["subtask created"] [component="worker controller"] [config="{"is-sharding":false,"shard-mode":"","online-ddl-scheme":"gh-ost","case-sensitive":false,"name":"dm-mysql_report","mode":"incremental","ignore-checking-items":["dump_privilege"],"source-id":"ds-mysql_report","server-id":429548349,"flavor":"mysql","meta-schema":"dm_meta","heartbeat-update-interval":1,"heartbeat-report-interval":10,"enable-heartbeat":false,"meta":{"BinLogName":"","BinLogPos":0,"BinLogGTID":"34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-168290280,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207"},"timezone":"","relay-dir":"relay-dir","use-relay":false,"from":{"host":"172.16.150.53","port":15381,"user":"dm_sync","max-allowed-packet":null,"session":{"time_zone":"+00:00"},"security":null},"to":{"host":"172.21.35.233","port":15381,"user":"dm_load","max-allowed-packet":null,"session":{"tidb_txn_mode":"optimistic","time_zone":"+00:00"},"security":null},"route-rules":[{"schema-pattern":"reverse_flow","table-pattern":"","target-schema":"reverse_center","target-table":""}],"filter-rules":[],"mapping-rule":[],"black-white-list":null,"block-allow-list":{"do-tables":[{"db-name":"reverse_flow","tbl-name":"rc_reverse_record_integration"}],"do-dbs":["reverse_flow"],"ignore-tables":null,"ignore-dbs":null},"mydumper-path":"./bin/mydumper","threads":1,"chunk-filesize":"64","statement-size":0,"rows":1000,"where":"","skip-tz-utc":true,"extra-args":"--consistency none","pool-size":8,"dir":"./dm-mysql_report.dm-mysql_report","meta-file":"","worker-count":128,"batch":100,"queue-size":1024,"checkpoint-flush-interval":30,"max-retry":0,"auto-fix-gtid":false,"enable-gtid":true,"disable-detect":false,"safe-mode":false,"enable-ansi-quotes":false,"log-level":"","log-file":"","log-format":"","log-rotate":"","pprof-addr":"","status-addr":"","config-file":"","clean-dump-file":false,"ansi-quotes":false}"] [2021/08/17 13:28:58.273 +08:00] [INFO] [:3024] ["use timezone"] [task=dm-mysql_report] [unit="binlog replication"] [location=UTC] [2021/08/17 13:28:58.891 +08:00] [INFO] [:599] ["detect server type"] [task=dm-mysql_report] [unit="binlog replication"] [scope=upstream] [type=MySQL] [2021/08/17 13:28:58.891 +08:00] [INFO] [:618] ["detect server version"] [task=dm-mysql_report] [unit="binlog replication"] [scope=upstream] [version=5.7.20-log] [2021/08/17 13:28:58.894 +08:00] [INFO] [:599] ["detect server type"] [task=dm-mysql_report] [unit="binlog replication"] [scope=downstream] [type=TiDB] [2021/08/17 13:28:58.894 +08:00] [INFO] [:618] ["detect server version"] [task=dm-mysql_report] [unit="binlog replication"] [scope=downstream] [version=4.0.13] [2021/08/17 13:28:59.422 +08:00] [INFO] [:699] ["create checkpoint schema"] [task=dm-mysql_report] [unit="binlog replication"] [component="remote checkpoint"] [statement="CREATE SCHEMA IF NOT EXISTS dm_meta
"] [2021/08/17 13:28:59.426 +08:00] [INFO] [:723] ["create checkpoint table"] [task=dm-mysql_report] [unit="binlog replication"] [component="remote checkpoint"] [statements="["CREATE TABLE IF NOT EXISTS dm_meta
. dm-mysql_report_syncer_checkpoint
(ntttid VARCHAR(32) NOT NULL,ntttcp_schema VARCHAR(128) NOT NULL,ntttcp_table VARCHAR(128) NOT NULL,ntttbinlog_name VARCHAR(128),ntttbinlog_pos INT UNSIGNED,ntttbinlog_gtid TEXT,ntttexit_safe_binlog_name VARCHAR(128) DEFAULT '',ntttexit_safe_binlog_pos INT UNSIGNED DEFAULT 0,ntttexit_safe_binlog_gtid TEXT,nttttable_info JSON NOT NULL,ntttis_global BOOLEAN,ntttcreate_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,ntttupdate_time timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,ntttUNIQUE KEY uk_id_schema_table (id, cp_schema, cp_table)ntt)"]"] [2021/08/17 13:28:59.429 +08:00] [INFO] [:785] ["fetch global checkpoint from DB"] [task=dm-mysql_report] [unit="binlog replication"] [component="remote checkpoint"] ["global checkpoint"="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207(flushed position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207)"] [2021/08/17 13:28:59.431 +08:00] [INFO] [:226] ["start to run"] [subtask=dm-mysql_report] [unit=Sync] [2021/08/17 13:28:59.431 +08:00] [INFO] [:351] ["handling subtask enabled"] [component="worker controller"] [2021/08/17 13:28:59.432 +08:00] [INFO] [:1342] ["replicate binlog from checkpoint"] [task=dm-mysql_report] [unit="binlog replication"] [checkpoint="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207"] [2021/08/17 13:28:59.440 +08:00] [INFO] [:72] ["last slave connection"] [task=dm-mysql_report] [unit="binlog replication"] ["connection ID"=31610609] [2021/08/17 13:28:59.440 +08:00] [INFO] [:100] ["change count"] [task=dm-mysql_report] [unit="binlog replication"] ["previous count"=0] ["new count"=0] [2021/08/17 13:28:59.440 +08:00] [INFO] [:100] ["change count"] [task=dm-mysql_report] [unit="binlog replication"] ["previous count"=0] ["new count"=1] [2021/08/17 13:28:59.440 +08:00] [INFO] [:59] ["enable safe-mode because of task initialization"] [task=dm-mysql_report] [unit="binlog replication"] ["duration in seconds"=60] [2021/08/17 13:29:00.075 +08:00] [INFO] [:1690] ["meet heartbeat event and then flush jobs"] [task=dm-mysql_report] [unit="binlog replication"] [2021/08/17 13:29:00.075 +08:00] [INFO] [:2746] ["flush all jobs"] [task=dm-mysql_report] [unit="binlog replication"] ["global checkpoint"="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207(flushed position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207)"] [2021/08/17 13:29:00.080 +08:00] [INFO] [:1003] ["flushed checkpoint"] [task=dm-mysql_report] [unit="binlog replication"] [checkpoint="position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207(flushed position: (mysql-bin.001906, 820109405), gtid-set: 34474b1e-2bf3-11e8-8515-00163e1040fb:1-1278385,767d5889-e08e-11ea-bf83-00163e0e3732:1,7fbb40a3-8240-11eb-8cda-00163e17fb0e:1-344070205,803ffea1-7b9d-11e9-87b0-00163e0e3732:1-435424493,a2d27a7f-de3c-11e7-82cd-00163e1040fb:1-304056,a33473fb-de3c-11e7-8140-00163e0e6470:1-68232043,bfbebe4d-1582-11e9-8e63-00163e082a23:1-36011466,e033b7c4-7b9d-11e9-8e45-00163e097eeb:1-608218207)"] [2021/08/17 13:29:13.098 +08:00] [INFO] [:753] [request=QueryStatus] [payload="name:"dm-mysql_report" "] [2021/08/17 13:29:13.098 +08:00] [INFO] [:509] ["will open a connection to get master status"] [component="worker controller"] ["upstream config"="{"host":"172.16.150.53","port":15381,"user":"dm_sync","max-allowed-packet":null,"session":{"time_zone":"+00:00"},"security":null}"] [2021/08/17 13:29:29.443 +08:00] [INFO] [:2627] ["binlog replication progress"] [task=dm-mysql_report] [unit="binlog replication"] ["total binlog size"=12632410] ["last binlog size"=0] ["cost time"=30] [bytes/Second=421080] ["unsynced binlog size"=0] ["estimate time to catch up"=0]
在新的dm-worker接管后,同步任务正常运行;由于切换需要60s左右,所以延迟至少在60s
<!---->
宕掉的dm-worker启动后,dm-worker是否会自动启动并重新加入集群会自动加入集群,dm-master leader会尝试重启宕掉的dm-worker
``` |
[2021/08/17 13:30:28.796 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] [2021/08/17 13:30:31.625 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] [2021/08/17 13:30:35.190 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8262 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8262: connect: connection refused". "] [component="embed etcd"] [2021/08/17 13:30:37.523 +08:00] [INFO] [:2206] [payload="name:"dm-172.17.201.115-8262" address:"172.17.201.115:8262" "] [request=RegisterWorker] [2021/08/17 13:30:37.523 +08:00] [WARN] [:836] ["add the same worker again"] [component=scheduler] ["worker info"="{"name":"dm-172.17.201.115-8262","addr":"172.17.201.115:8262"}"] [2021/08/17 13:30:37.523 +08:00] [INFO] [:309] ["register worker successfully"] [name=dm-172.17.201.115-8262] [address=172.17.201.115:8262] [2021/08/17 13:30:37.529 +08:00] [INFO] [:216] ["receive dm-worker keep alive event"] [operation=PUT] [kv=/dm-worker/a/646d2d3137322e31372e3230312e3131352d38323632] [2021/08/17 13:30:37.529 +08:00] [INFO] [:1506] ["receive worker status change event"] [component=scheduler] [delete=false] [event="{"worker-name":"dm-172.17.201.115-8262","join-time":"2021-08-17T13:30:37.524837339+08:00"}"] [2021/08/17 13:30:37.529 +08:00] [INFO] [:1739] ["no unbound sources need to bound"] [component=scheduler] [worker="{"name":"dm-172.17.201.115-8262","addr":"172.17.201.115:8262"}"]
模拟dm-master宕机
``` |
date; kill -9 pid; mv <deploy dir> <deploy dir>-1 # 强制kill dm-master pid,并将部署目录改名防止自启动
观察leader切换情况
记录相关数据:leader切换耗时,所有任务状态,延时情况
结论:
leader是否正常选举
``` |
[2021/08/17 14:17:36.240 +08:00] [WARN] [:436] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-reader-type="stream MsgApp v2"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [error="unexpected EOF"] [2021/08/17 14:17:36.240 +08:00] [WARN] [:68] ["peer became inactive (message send to peer failed)"] [component="embed etcd"] [peer-id=201495974e8233cd] [error="failed to read 201495974e8233cd on stream MsgApp v2 (unexpected EOF)"] [2021/08/17 14:17:36.240 +08:00] [WARN] [:436] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-reader-type="stream Message"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [error="unexpected EOF"] [2021/08/17 14:17:36.241 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:37.241 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:38.097 +08:00] [WARN] [:193] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-writer-type="stream Message"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [2021/08/17 14:17:38.855 +08:00] [WARN] [:163] ["apply request took too long"] [component="embed etcd"] [took=2.096232825s] [expected-duration=100ms] [prefix="read-only range "] [request="key:"/dm-master/bound-worker/646d2d3137322e31372e3230312e3131362d38323634" "] [response=] [error="etcdserver: leader changed"] [2021/08/17 14:17:38.857 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:38.857 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:38.958 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:39.763 +08:00] [WARN] [:193] ["lost TCP streaming connection with remote peer"] [component="embed etcd"] [stream-writer-type="stream MsgApp v2"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [2021/08/17 14:17:41.719 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:42.859 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:42.859 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:45.053 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:45.855 +08:00] [WARN] [:746] ["timed out waiting for read index response (local node might have slow network)"] [component="embed etcd"] [timeout=7s] [2021/08/17 14:17:45.855 +08:00] [WARN] [:163] ["apply request took too long"] [component="embed etcd"] [took=9.033455931s] [expected-duration=100ms] [prefix="read-only range "] [request="key:"/dm-master/bound-worker/646d2d3137322e31372e3230312e3131352d38323633" "] [response=] [error="etcdserver: request timed out"] [2021/08/17 14:17:45.855 +08:00] [WARN] [:163] ["apply request took too long"] [component="embed etcd"] [took=9.085679831s] [expected-duration=100ms] [prefix="read-only range "] [request="key:"/dm-master/bound-worker/646d2d3137322e31382e37382e3235342d38323636" "] [response=] [error="etcdserver: request timed out"] [2021/08/17 14:17:45.855 +08:00] [WARN] [:163] ["apply request took too long"] [component="embed etcd"] [took=9.085819911s] [expected-duration=100ms] [prefix="read-only range "] [request="key:"/dm-master/bound-worker/646d2d3137322e31382e37382e3235342d38323632" "] [response=] [error="etcdserver: request timed out"] [2021/08/17 14:17:45.856 +08:00] [WARN] [:163] ["apply request took too long"] [component="embed etcd"] [took=6.99962841s] [expected-duration=100ms] [prefix="read-only range "] [request="key:"/dm-master/relay-worker/646d2d3137322e31372e3230312e3131362d38323634" "] [response="range_response_count:0 size:5"] [] [2021/08/17 14:17:46.860 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:46.860 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:47.820 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:49.716 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:50.305 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:50.862 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:50.862 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:50.909 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:53.418 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:54.717 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:54.863 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:54.863 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:55.305 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:56.854 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:17:58.864 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:58.865 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:59.717 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:17:59.859 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:00.305 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:02.469 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:02.866 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:02.866 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:04.717 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:05.305 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:05.898 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:06.867 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:06.867 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:08.588 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:09.717 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:10.306 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:10.868 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:10.868 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:11.740 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:14.586 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:14.717 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:14.869 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:14.869 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:15.306 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:17.707 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:18.870 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:18.870 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:19.718 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:19.876 +08:00] [INFO] [:2206] [payload="leader:true master:true names:"master-3" "] [request=ListMember] [2021/08/17 14:18:19.876 +08:00] [INFO] [:2221] ["will forward after a short interval"] [from=master-3] [to=master-1] [request=ListMember] [2021/08/17 14:18:20.306 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:20.388 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:22.871 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:22.871 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:23.242 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:24.718 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:25.306 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:25.676 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:26.872 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:26.873 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:28.384 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:29.718 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:30.306 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:30.874 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:30.874 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:31.067 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:34.088 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:34.718 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:34.875 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:34.875 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:35.307 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:37.289 +08:00] [WARN] [:60] ["grpc: ateTransport failed to connect to {172.17.201.115:8261 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 172.17.201.115:8261: connect: connection refused". "] [component="embed etcd"] [2021/08/17 14:18:38.876 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:38.876 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:39.719 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_SNAPSHOT] [remote-peer-id=201495974e8233cd] [rtt=331.518µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:18:40.003 +08:00] [INFO] [:292] ["get response from election observe"] [component=election] [key=/dm-master/leader/1ba57a3843f19704] [value="{"id":"master-2","addr":"172.18.78.254:8261"}"] [2021/08/17 14:18:40.003 +08:00] [INFO] [:316] ["current member is not the leader"] [component=election] ["current member"="{"id":"master-3","addr":"172.17.201.116:8261"}"] [leader="{"id":"master-2","addr":"172.18.78.254:8261"}"] [2021/08/17 14:18:40.003 +08:00] [INFO] [:97] ["get new leader"] [leader=master-2] ["current member"=master-3]
正常选举,耗时约60s
<!---->
选举过程中,同步任务的情况(延迟、状态等)选举过程中,正在同步的任务不受影响,无延迟,同步日志存在相关ERROR或者WARN信息(可忽略)
``` |
[2021/08/17 14:17:36.269 +08:00] [ERROR] [:596] ["WatchSourceBound received an error"] [error="etcdserver: mvcc: required revision has been compacted"] [2021/08/17 14:17:36.269 +08:00] [ERROR] [:635] ["WatchRelayConfig received an error"] [error="etcdserver: mvcc: required revision has been compacted"] [2021/08/17 14:17:36.269 +08:00] [ERROR] [:675] ["WatchSubTaskStage received an error"] [error="etcdserver: mvcc: required revision has been compacted"] [2021/08/17 14:17:38.857 +08:00] [INFO] [:274] ["enter DisableRelay"] [component="worker controller"] [2021/08/17 14:17:38.858 +08:00] [WARN] [:278] ["already disabled relay"] [component="worker controller"] [2021/08/17 14:17:38.858 +08:00] [INFO] [:149] ["ignore same keepalive TTL change"] [TTL=60] [2021/08/17 14:17:38.858 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 14:17:38.858 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 14:17:38.859 +08:00] [INFO] [:473] ["resume sub task"] [component="worker controller"] [task=dm-rds_master] [2021/08/17 14:17:38.859 +08:00] [ERROR] [:585] ["fail to operate subtask stage"] [stage="{"expect":2,"source":"ds-rds_master","task":"dm-rds_master"}"] [task=dm-rds_master] [error="[code=40051:class=dm-worker:scope=internal:level=high], Message: current stage is Running but not paused, invalid"] [errorVerbose="[code=40051:class=dm-worker:scope=internal:level=high], Message: current stage is Running but not paused, invalidngithub/pingcap/dm/pkg/terror.(*Error).Generatent/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/pkg/:265ngithub/pingcap/dm/dm/worker.(*SubTask).Resument/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/dm/:486ngithub/pingcap/dm/dm/worker.(*Worker).OperateSubTasknt/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/dm/:474ngithub/pingcap/dm/dm/worker.(*Worker).operateSubTaskStagent/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/dm/:706ngithub/pingcap/dm/dm/worker.(*Worker).resetSubtaskStagent/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/dm/:582ngithub/pingcap/dm/dm/worker.(*Worker).observeSubtaskStagent/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/dm/:631ngithub/pingcap/dm/dm/worker.(*Worker).EnableHandleSubtasks.func1nt/home/jenkins/agent/workspace/build_dm_multi_branch_v2.0.4/go/src/github/pingcap/dm/dm/:itnt/usr/local/go/src/runtime/asm_amd64.s:1357"] [2021/08/17 14:17:45.859 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 14:17:47.316 +08:00] [INFO] [:1003] ["flushed checkpoint"] [task=dm-rds_master] [unit="binlog replication"] [checkpoint="position: (mysql-bin.013427, 319445109), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745467924,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680(flushed position: (mysql-bin.013427, 319445109), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745467924,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680)"] [2021/08/17 14:18:01.661 +08:00] [INFO] [:2627] ["binlog replication progress"] [task=dm-rds_master] [unit="binlog replication"] ["total binlog size"=461082248942] ["last binlog size"=461058611217] ["cost time"=30] [bytes/Second=787924] ["unsynced binlog size"=0] ["estimate time to catch up"=0] [2021/08/17 14:18:01.661 +08:00] [INFO] [:2654] ["binlog replication status"] [task=dm-rds_master] [unit="binlog replication"] [total_events=461442238] [total_tps=562] [tps=162] [master_position="(mysql-bin.013427, 329839142)"] [master_gtid=007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745480546,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680] [checkpoint="position: (mysql-bin.013427, 329839142), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745480546,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680(flushed position: (mysql-bin.013427, 319445109), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745467924,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680)"] [2021/08/17 14:18:17.523 +08:00] [INFO] [:1003] ["flushed checkpoint"] [task=dm-rds_master] [unit="binlog replication"] [checkpoint="position: (mysql-bin.013427, 337456334), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745489146,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680(flushed position: (mysql-bin.013427, 337456334), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745489146,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680)"] [2021/08/17 14:18:31.661 +08:00] [INFO] [:2627] ["binlog replication progress"] [task=dm-rds_master] [unit="binlog replication"] ["total binlog size"=461093711235] ["last binlog size"=461082248942] ["cost time"=30] [bytes/Second=382076] ["unsynced binlog size"=0] ["estimate time to catch up"=0] [2021/08/17 14:18:31.661 +08:00] [INFO] [:2654] ["binlog replication status"] [task=dm-rds_master] [unit="binlog replication"] [total_events=461447300] [total_tps=562] [tps=168] [master_position="(mysql-bin.013427, 341302136)"] [master_gtid=007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745492663,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680] [checkpoint="position: (mysql-bin.013427, 341300826), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745492663,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680(flushed position: (mysql-bin.013427, 337456334), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745489146,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680)"] [2021/08/17 14:18:40.034 +08:00] [INFO] [:581] ["receive source bound"] [bound="{"source":"ds-rds_master","worker":"dm-172.17.201.115-8263"}"] ["is deleted"=false] [2021/08/17 14:18:40.036 +08:00] [WARN] [:826] ["session variable 'time_zone' is overwritten by default UTC timezone."] [time_zone=+00:00] [2021/08/17 14:18:40.036 +08:00] [INFO] [:830] ["mysql source is being handled"] [sourceID=ds-rds_master] [2021/08/17 14:18:40.036 +08:00] [INFO] [:310] ["enter EnableHandleSubtasks"] [component="worker controller"] [2021/08/17 14:18:40.036 +08:00] [WARN] [:314] ["already enabled handling subtasks"] [component="worker controller"] [2021/08/17 14:18:47.719 +08:00] [INFO] [:1003] ["flushed checkpoint"] [task=dm-rds_master] [unit="binlog replication"] [checkpoint="position: (mysql-bin.013427, 345689091), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745496851,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680(flushed position: (mysql-bin.013427, 345689091), gtid-set: 007c9203-833d-11e7-aff9-6c92bf21bbe1:1-931221440,11efeebc-747e-11e5-8527-d89d672b3674:1-36643616,1d258984-586b-11e8-9e16-7cd30ae3fcb8:1-1251269710,20dc5615-747e-11e5-8528-1051721b3701:1-39135349,22fb37d8-09ec-11e9-a38f-7cd30a5a2712:1-11655067164,4e09d36c-bb22-11e8-a1cb-7cd30ad38860:1-530901004,8940bea7-54b1-11e6-bb21-6c92bf31607b:1-1096743131,89cc53a8-e019-11e7-8d82-7cd30adae7d0:1-213160520,96767ef0-54b1-11e6-bb21-6c92bf31493f:1-325177501,9b5f2a07-a37c-11e8-8798-7cd30adae88c:1-140532061,a3408fb3-7479-11e5-850a-008cfae41260:1-6695,a7b9cbdc-e019-11e7-8d83-7cd30adbc3c8:1-1582021685,cc6f5f0f-014d-11e6-9b5c-6c92bf21d7b1:1-329680794,ce41dd8e-a37c-11e8-8799-7cd30ac427e8:1-985022222,d17b141f-e217-11ea-aa81-b8599f52a93c:1-4745496851,db92643a-014d-11e6-9b5c-6c92bf21bb19:1-4,f650defc-09eb-11e9-a38e-7cd30ae4109e:1-166445680)"]
<!---->
dm-master所在机器启动后,是否会自动启动并重新加入集群
``` |
[2021/08/17 14:19:34.892 +08:00] [WARN] [:315] ["failed to reach the peer URL"] [component="embed etcd"] [address= 172.17.201.115:8291/version ] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:19:34.892 +08:00] [WARN] [:168] ["failed to get version"] [component="embed etcd"] [remote-member-id=201495974e8233cd] [error="Get 172.17.201.115:8291/version : dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:19:35.309 +08:00] [WARN] [:70] ["prober detected unhealthy status"] [component="embed etcd"] [round-tripper-name=ROUND_TRIPPER_RAFT_MESSAGE] [remote-peer-id=201495974e8233cd] [rtt=593.425µs] [error="dial tcp 172.17.201.115:8291: connect: connection refused"] [2021/08/17 14:19:38.496 +08:00] [WARN] [:277] ["established TCP streaming connection with remote peer"] [component="embed etcd"] [stream-writer-type="stream MsgApp v2"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [2021/08/17 14:19:38.496 +08:00] [WARN] [:277] ["established TCP streaming connection with remote peer"] [component="embed etcd"] [stream-writer-type="stream Message"] [local-member-id=db326cb7fba547f5] [remote-peer-id=201495974e8233cd] [2021/08/17 14:19:49.428 +08:00] [INFO] [:2206] [payload="leader:true master:true names:"master-3" "] [request=ListMember]
会自动加入集群,dm-master leader会尝试重启宕掉的dm-master
模拟dm-master/worker同时宕机
``` |
date; kill -9 m_pid; kill -9 w_pid; mv <deploy dir> <deploy dir>-1; mv <deploy dir> <deploy dir>-1 # 强制kill dm-master/worker pid,并将部署目录改名防止自启动
观察leader切换和任务转移情况
记录相关数据:leader切换耗时,所有任务状态,延时情况
结论:
dm-master成功选举,耗时60s左右
正在运行的同步任务不受影响
同时宕机的dm-worker上的任务无法转移到另一个free状态的dm-worker上,直到挂掉的dm-worker重启后任务才会启动(版本2.0.6及以下)
若dm-worker无法启动,需要进行如下步骤 :
``` |
** 风险点:**当前线上dm-master/dm-worker是混部的,存在风险,dm-master leader节点上不能有dm-worker任务, 已提Bug
使用滚动升级命令,将dm集群升级到v2.0.6
``` |
tiup dm upgrade <dm-cluster> v2.0.6
结论:
采用滚动升级方式,顺序:dm-master→ dm-worker→ prometheus→ grafana
本文发布于:2024-02-04 10:42:34,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170705337854856.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |