Mysql Fabric高可用集群分片功能测试

Abbie ·
更新时间:2024-09-20
· 853 次阅读

  一、MySQL Fabric高可用集群中一台数据库崩溃了,不影响数据的完整性   1.测试前准备   a)   查看group_id-1集群组的服务器状态   mysqlfabric group lookup_servers group_id-1   返回结果: Command : { success = True return = [{'status': 'PRIMARY', 'server_uuid': '7a45f71d-7934-11e4-9e8c-782bcb74823a', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.1.71:3306'}, {'status': 'SECONDARY', 'server_uuid': '9cf162ca-7934-11e4-9e8d-782bcb1b6b98', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.76:3306'}, {'status': 'SECONDARY', 'server_uuid': 'ae94200b-7932-11e4-9e81-a4badb30e16b', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.70:3306'}] activities = }   此时primary机为机器:192.168.1.71:3306,group_id-1集群组正常   b)   查看group_id-1集群组中的数据 mysql -P 3306 -h 192.168.1.76 -u root -e "select * from test.subscribers" mysql -P 3306 -h 192.168.1.71 -u root -e "select * from test.subscribers" mysql -P 3306 -h 192.168.1.70 -u root -e "select * from test.subscribers" 返回结果如下: +--------+------------+-----------+ | sub_no | first_name | last_name | +--------+------------+-----------+ | 500 | Billy | Joel | | 1500 | Arthur | Askey | | 5000 | Billy | Fish | | 17542 | Bobby | Ball | | 22 | Billy | Bob | | 8372 | Banana | Man | | 93846 | Bill | Ben | | 15050 | John | Smith | +--------+------------+-----------+   c)   利用python接口查看数据库test的数据表subscribers分片后的全部数据   python read_table_ha.py   返回结果如下: (u'Billy', u'Bob') (u'Billy', u'Fish') (u'Billy', u'Joel') (u'Arthur', u'Askey') (u'Banana', u'Man') (u'Billy', u'Fish') (u'Bill', u'Ben') (u'Jimmy', u'White') (u'John', u'Smith') (u'Bobby', u'Ball')   2.开始测试   a)  激活故障自动切换   即使Fabric选出了master角色,但当这个master角色宕机时,fanric不会自动将secondary角色切换为master角色,所以需要将HA集群组的配置改为自动切换角色   mysqlfabric group activate group_id-1   b)  停止group_id-1组中的master角色,即数据库实例192.168.1.71:3306,再查看fabric的状态   mysqlfabric group lookup_servers group_id-1   返回结果: Command : { success = True return = [{'status': 'FAULTY', 'server_uuid': '7a45f71d-7934-11e4-9e8c-782bcb74823a', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.1.71:3306'}, {'status': 'SECONDARY', 'server_uuid': '9cf162ca-7934-11e4-9e8d-782bcb1b6b98', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.76:3306'}, {'status': 'PRIMARY', 'server_uuid': 'ae94200b-7932-11e4-9e81-a4badb30e16b', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.1.70:3306'}] activities = }   其中,数据库实例192.168.1.71:3306的状态为FAULTY,mysql fabric自动检测到了HA组的故障,并自动选举了slave角色为primary角色

  c)    查看group_id-1集群组中的数据   mysql -P 3306 -h 192.168.1.76 -u root -e "select * from test.subscribers"   mysql -P 3306 -h 192.168.1.70 -u root -e "select * from test.subscribers"   返回结果: +--------+------------+-----------+ | sub_no | first_name | last_name | +--------+------------+-----------+ | 500 | Billy | Joel | | 1500 | Arthur | Askey | | 5000 | Billy | Fish | | 17542 | Bobby | Ball | | 22 | Billy | Bob | | 8372 | Banana | Man | | 93846 | Bill | Ben | | 15050 | John | Smith | +--------+------------+-----------+   d)    利用python接口查看数据库test的数据表subscribers分片后的全部数据   python read_table_ha.py   返回结果:   (u'Billy', u'Bob')   (u'Billy', u'Fish')   (u'Billy', u'Joel')   (u'Arthur', u'Askey')   (u'Banana', u'Man')   (u'Billy', u'Fish')   (u'Bill', u'Ben')   (u'Jimmy', u'White')   (u'John', u'Smith')   (u'Bobby', u'Ball')   结论:由以上测试返回数据表明,当HA组(group_id-1)中的master角色崩溃,mysql fabric会自动选举一个slave角色为master,并可正常查询原来的数据,因此当mysql fabric高可用集群中某一台数据库崩溃掉,并不影响数据的完整性   PS:手工恢复FAULT状态的数据库实例   mysql fabric高可用集群中的服务器状态有四种:primary、secondary、faulty、spare   关闭primary,primary不能从组中移除,需要使用mysqlfabric group demote group_id-1来关闭组中的primary,不会重新选举一个新的master,同时也不会关闭故障检测   a)    当其中的一个数据库实例崩溃,可以将这个数据库实例从组中删除,启动这个数据库实例后再重新添加,例如:   mysqlfabric group remove group_id-1 7a45f71d-7934-11e4-9e8c-782bcb74823a   mysqlfabric group add group_id-1 192.168.1.71:3306   mysqlfabric group lookup_servers group_id-1   返回结果:   Command :   { success = True   return = [{'status': 'SECONDARY', 'server_uuid': '7a45f71d-7934-11e4-9e8c-782bcb74823a', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.71:3306'}, {'status': 'SECONDARY', 'server_uuid': '9cf162ca-7934-11e4-9e8d-782bcb1b6b98', 'mode': 'READ_ONLY', 'weight': 1.0, 'address': '192.168.1.76:3306'}, {'status': 'PRIMARY', 'server_uuid': 'ae94200b-7932-11e4-9e81-a4badb30e16b', 'mode': 'READ_WRITE', 'weight': 1.0, 'address': '192.168.1.70:3306'}]   activities =   }   b)    当HA组中全部的服务器宕机,实例重启后,fabric的状态不会自动恢复,需要先执行命令:mysqlfabric group demote group_id-1,因为不能直接修改server的状态由faulty为secondary,需要将状态修改为spare,在改为secondary,如下:   mysqlfabric group demote group_id-1   mysqlfabric group deactivate group_id-1   mysqlfabric server set_status 7a45f71d-7934-11e4-9e8c-782bcb74823a spare   mysqlfabric server set_status 7a45f71d-7934-11e4-9e8c-782bcb74823a secondary   ###当所有的数据库实例都为secondary状态时,可选举master角色,命令如下:   mysqlfabric group promote group_id-1   二、MySQL Fabric高可用集群存储均衡   1.测试前准备   a)    测试脚本 cat test_add_subs_shards.py import mysql.connector from mysql.connector import fabric import math def add_subscriber(conn, sub_no, first_name, last_name): conn.set_property(tables=["test.subscribers"], key=sub_no, mode=fabric.MODE_READWRITE) cur = conn.cursor() cur.execute( "INSERT INTO subscribers VALUES (%s, %s, %s)", (sub_no, first_name, last_name) ) conn = mysql.connector.connect( fabric={"host" : "localhost", "port" : 32274, "username": "admin", "password" : "admin"}, user="root", database="test", password="", autocommit=True ) conn.set_property(tables=["test.subscribers"], scope=fabric.SCOPE_LOCAL) for num in range(10): add_subscriber(conn, "%s" % num, "k%s" % num, "kw%s" % num)    2.开始hash分片测试   a)插入10条记录,当group_id-1组的三台服务器正常,group_id-2组只有一台服务器正常运作时,   group_id-1组的数据如下:

  b)    测试插入1万条记录,group_id-1组和group_id-2组都有三台服务器正常运作   分片后,group_id-1组插入的数据行数如下:   mysql -P 3306 -h 192.168.1.70 -u root -e "select count(*) from test.subscribers"   返回结果:   +----------+   | count(*) |   +----------+   | 7138 |   +----------+   分片后,group_id-2组插入的数据行数如下:   mysql -P 3309 -h 192.168.1.76 -u root -e "select count(*) from test.subscribers"   返回结果:   +----------+   | count(*) |   +----------+   | 2903 |   +----------+   c)     测试插入10万条记录,group_id-1组和group_id-2组都有三台服务器正常运作   分片后,group_id-1组插入的数据行数如下:   mysql -P 3306 -h 192.168.1.70 -u root -e "select count(*) from test.subscribers"   返回结果:   +----------+   | count(*) |   +----------+   | 78719 |   +----------+   mysql -P 3309 -h 192.168.1.76 -u root -e "select count(*) from test.subscribers"   返回结果:   +----------+   | count(*) |   +----------+   | 31321 |   +----------+   d)    测试插入100万条记录,group_id-1组和group_id-2组都有三台服务器正常运作分片后,group_id-1组插入的数据行数如下:   mysql -P 3306 -h 192.168.1.70 -u root -e "select count(*) from test.subscribers"   返回结果:   +----------+   | count(*) |   +----------+   | 794287 |   +----------+   分片后,group_id-2组插入的数据行数如下:   mysql -P 3309 -h 192.168.1.76 -u root -e "select count(*) from test.subscribers"   返回结果:   +----------+   | count(*) |   +----------+   | 315752 |   +----------+   添加一个新HA组(group_id-3),对HA组(group_id-1)进行分片,操作如下:   a)    在机器:192.168.1.71 192.168.1.76 192.168.1.230上各起一个数据库实例,添加为组group_id-3   mysqlfabric group add group_id-3 192.168.1.71:3313   mysqlfabric group add group_id-3 192.168.1.76:3315   mysqlfabric group add group_id-3 192.168.1.230:3317   选举primary角色   mysqlfabric group promote group_id-3   b)    重置HA组group_id-3所有的数据库实例   mysql -P 3313 -h 192.168.1.71 -u root -e "reset master"   mysql -P 3315 -h 192.168.1.76 -u root -e "reset master"   mysql -P 3317 -h 192.168.1.230 -u root -e "reset master"   c)    查看HA组(group_id-1)的shard_id值,在储存机查看fabric数据库   mysql -h 127.0.0.1 -P3306 -u root -e 'select * from fabric.shards'   返回结果:   +----------+------------+---------+   | shard_id | group_id | state |   +----------+------------+---------+   | 4 | group_id-1 | ENABLED |   | 5 | group_id-2 | ENABLED |   +----------+------------+---------+   d)    对HA组(group_id-1)进行分片分离,将   mysqlfabric sharding split_shard 4 group_id-3   ps:如果这里执行分片不成功,错误如下:   Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '' at 4, the last event read from './binlog.000003' at 621, the last byte read from './binlog.000003' at 621.'   解决方法:   将master角色的数据dump出来,如下:   mysqldump -h 192.168.1.76 -u root -P3315 --all-databases --flush-privileges --single-transaction --flush-logs --triggers --routines --events --hex-blob >/data/database/full_backup_master.sql   停掉slave机   mysql -P 3317 -h 192.168.1.230 -u root -e "stop slave"   mysql -P 3317 -h 192.168.1.230 -u root -e "reset master"   mysql -P 3317 -h 192.168.1.230 -u root </data/full_backup_master.sql   mysql -P 3317 -h 192.168.1.230 -u root -e "start slave"   再次执行分片,会报以下错误:   Procedure :   { uuid = 4da230c2-31c3-4242-bd88-ccafd51bfac1,   finished = True,   success = False,   return = BackupError: ('Error while restoring the backup using the mysql client , %s', "ERROR 1840 (HY000) at line 24 in file: 'MySQL_192.168.1.76_3306.sql': @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty. "),   activities =   }   解决方法:   在HA组(group_id-3)的全部数据库实例上执行“reset master”,再分片即可   e)    现在HA组有group_id-1、group_id-2、group_id-3,尝试插入100条记录   group_id-1的插入的数据行数有:   mysql -P 3306 -h 192.168.1.70 -u root -e "select count(*) from test.subscribers where sub_no between 20000 and 20100"   返回结果:   +----------+   | count(*) |   +----------+   | 70 |   +----------+   group_id-2、group_id-3的插入的数据行数有:   mysql -P 3309 -h 192.168.1.76 -u root -e "select count(*) from test.subscribers where sub_no between 20000 and 20100"   mysql -P 3313 -h 192.168.1.230 -u root -e "select count(*) from test.subscribers where sub_no between 20000 and 20100"   返回结果:   +----------+   | count(*) |   +----------+   | 30 |   +----------+   用select语句可知,HA组group_id-2、group_id-3插入的数据是一样的   3.hash分片测试结论   在插入1万条记录时,HA组(group_id-1)插入了7138条记录,HA组(group_id-2)插入了2903条记录   在插入10万条记录时,HA组(group_id-1)插入了78719条记录,HA组(group_id-2)插入了31321条记录   在插入100万条记录时,HA组(group_id-1)插入了794287条记录,HA组(group_id-2)插入了315752条记录   因此,对mysql fabric高集群进行hash分片后,分片比例为7:3,负载并不是十分均衡   三、调查PHP、Erlang等语言如何连接MySQL Fabric   目前Fabric提供了Php, Java, Python三大类的API,大家可以通过编程的方式来操作Fabric。   python脚本如上面所示   Connector/Python:http://dev.mysql.com/doc/mysql-utilities/1.5/en/connector-python-fabric.html   Connector/J:http://dev.mysql.com/doc/mysql-utilities/1.5/en/connector-j-fabric.html   Connector:http://php.net/manual/zh/book.mysqlnd-ms.php   四、mysqlfabric简单性能测试   1、重新初始化环境,对数据重新分片,如上   2、测试mysqlfabric的ips

3、测试mysqlfabric的qbs

  结论   从以上数据说明,mysqlfabric的ips和qbs的性能是随着进程数的增加而呈正比例上升的,上升的幅度不大,以上测试采用的是默认的mysql配置,可以调整innodb_buffuer_size这个参数来增加mysql的性能,得出来的不尽如人意。   此外,在测试mysqlfabric的过程中,还有以下几点问题需要解决:   a.    分片时,发生以下错误   [ERROR] 1420771468.040900 - Executor-24 - Error in _check_shard_information.   Traceback (most recent call last):   File "/usr/local/lib/python2.7/site-packages/mysql/fabric/executor.py", line 378, in execute   self.__result = self.__action(*self.__args, **self.__kwargs)   File "/usr/local/lib/python2.7/site-packages/mysql/fabric/events.py", line 80, in wrapped   return wrapped.function(*args, **kwargs)   File "/usr/local/lib/python2.7/site-packages/mysql/fabric/services/resharding.py", line 250, in _check_shard_information   upper_bound   File "/usr/local/lib/python2.7/site-packages/mysql/fabric/sharding.py", line 979, in split_value   upper_bound = int(upper_bound, 16)   TypeError: int() can't convert non-string with explicit base   Procedure :   { uuid = 859e1cdd-24fd-4ccd-9363-186d7df91971,   finished = True,   success = False,   return = TypeError: int() can't convert non-string with explicit base,   activities =   }   对于python来说,是进制之间不能正常转换,因此,在添加分片的数据库的数据,不能只添加自增长id的数据   b.mysqlfabric进程数的限制   配置文件/etc/mysql/fabric.cfg,当threads=100、executors=49,会发现执行mysqlfabric manage start &报错   Procedure :   { uuid = fea93aaf-4f01-43ae-b1ca-89d4abee6b2e,   finished = True,   success = False,   return = ConfigurationError: Too many threads requested. Session threads (100), Executor threads (49), Control threads (1) and Failure Detector threads (0). The maximum number of threads allowed is (150). Increase the maximum number of connections in the state store in order to increase this limit.,   activities =   }   由源代码可知,进程数与执行数之和不能大于150-1   c.mysqlfabric程序本身的不稳定性   mysqlfabrci程序本身容错性比较弱,当操作失误时,有时会卡住不执行,然后需要将进程kill掉重新启动,使用stop命令停不掉;另外,当在分片时,进程突然中断,重启mysqlfabric后,不会继续执行上次分片的位置,而是需要重新分片。   d.mysqlfabric同时进行多进程插入查询时,会报错   在进行测试时,当同时插入数据的进程数或查询的进程数大于40,会报以下错误 Process Process-25: Traceback (most recent call last): File "/usr/local/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/local/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "thread_tps_test.py", line 43, in run_test conn = create_conn_by_fabric() File "thread_tps_test.py", line 23, in create_conn_by_fabric user="root", password="", autocommit=True,database='test' File "/usr/local/lib/python2.7/site-packages/mysql/connector/__init__.py", line 144, in connect return mysql.connector.fabric.connect(*args, **kwargs) File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/__init__.py", line 53, in connect return MySQLFabricConnection(**kwargs) File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 809, in __init__ self.store_config(**kwargs) File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 948, in store_config self._configure_fabric(config['fabric']) File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 929, in _configure_fabric self._fabric.seed() File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 332, in seed fabinst.connect() File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 763, in connect self._proxy = self._xmlrpc_get_proxy() File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 747, in _xmlrpc_get_proxy proxy._some_nonexisting_method() # pylint: disable=W0212 File "/usr/local/lib/python2.7/xmlrpclib.py", line 1224, in __call__ return self.__send(self.__name, args) File "/usr/local/lib/python2.7/xmlrpclib.py", line 1578, in __request verbose=self.__verbose File "/usr/local/lib/python2.7/site-packages/mysql/connector/fabric/connection.py", line 272, in request raise InterfaceError("Connection with Fabric failed: " + msg) InterfaceError: Connection with Fabric failed: <urlopen error [Errno 110] Connection timed out>   5.    测试mysqlfabric性能的脚本   测试mysqlfabric ips __author__ = 'kw107301' import mysql.connector from mysql.connector import fabric import random import sys from multiprocessing import Process import time table_count=1000 requests = int(sys.argv[1]) type = int(sys.argv[2]) threads = int(sys.argv[3]) print "requests:%d, type:%d, threads:%d"%(requests,type,threads) print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) cur = None def create_conn_by_fabric(): conn = mysql.connector.connect( fabric={"host" : "localhost", "port" : 32274, "username": "admin", "password" : "admin" }, user="root", password="", autocommit=True,database='test' ) conn.set_property(tables=["test.subscribers"], scope=fabric.SCOPE_LOCAL, mode=fabric.MODE_READONLY) return conn def get_result(conn): sub_no = random.randint(1, table_count) if type == 1: conn = create_conn_by_fabric() conn.set_property(tables=["test.subscribers"], key=sub_no, mode=fabric.MODE_READONLY) """conn.reset_cache()""" cur = conn.cursor() cur.execute( "INSERT INTO subscribers VALUES (%s, 'Billy', 'Fish')", (sub_no,first_name, last_name) ) def run_test(): conn = create_conn_by_fabric() for i in range(requests/threads): get_result(conn) if __name__ == '__main__': thd = [] s_time = time.time() for i in range(threads): t = Process(target=run_test) thd.append(t) t.start() for m in thd: m.join() print "test finished, total cost %.2f second, TPS is %.2f"%((time.time()-s_time),requests/(time.time()-s_time))   测试mysqlfabric qbs import mysql.connector from mysql.connector import fabric import random import sys from multiprocessing import Process import time table_count=10000 requests = int(sys.argv[1]) type = int(sys.argv[2]) threads = int(sys.argv[3]) print "requests:%d, type:%d, threads:%d"%(requests,type,threads) print time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) cur = None def create_conn_by_fabric(): conn = mysql.connector.connect( fabric={"host" : "localhost", "port" : 32274, "username": "admin", "password" : "admin" }, user="root", password="", autocommit=True,database='test' ) conn.set_property(tables=["test.subscribers"], scope=fabric.SCOPE_LOCAL, mode=fabric.MODE_READONLY) return conn def get_result(conn): sub_no = random.randint(0, table_count) if type == 1: conn = create_conn_by_fabric() conn.set_property(tables=["test.subscribers"], key=sub_no, mode=fabric.MODE_READONLY) cur = conn.cursor() cur.execute( "SELECT * FROM subscribers " "WHERE sub_no = %s", (sub_no, ) ) def run_test(): conn = create_conn_by_fabric() for i in range(requests/threads): get_result(conn) if __name__ == '__main__': thd = [] s_time = time.time() for i in range(threads): t = Process(target=run_test) thd.append(t) t.start() for m in thd: m.join() print "test finished, total cost %.2f second, QPS is %.2f"%((time.time()-s_time),requests/(time.time()-s_time))



集群 功能测试 fabric 测试 Mysql

需要 登录 后方可回复, 如果你还没有账号请 注册新账号