Centos 7 部署 OpenStack_Rocky版高可用集群3-1

Odele ·
更新时间:2024-09-20
· 503 次阅读

Centos 7 部署 OpenStack_Rocky版高可用集群3-1 文章目录Centos 7 部署 OpenStack_Rocky版高可用集群3-11、前期准备1.1 软硬件准备1.1.1组件1.1.2 网络拓扑(逻辑)1.1.3 物理拓扑(部分参考)1.2 配置hosts文件(集群所有电脑配置)1.3 节点间设置免密互相登陆1.4 配置NTP服务1.4.1 在控制节点进行配置(cont01、cont02、cont03)1.4.2 在其他节点上安装NTP软件(mon01、comp01、comp02、comp03)1.5 系统初始设置1.5.1 所有节点关闭防火墙和selinux2、部署MariaDB Galeria集群2.1 控制节点安装mariadb(cont01、cont02、cont03)2.1.1 控制节点添加mariadb-10.4.11的yum源2.1.2 控制节点yum安装mariadb软件包并开启服务2.1.3 控制节点数据库安全设置、登陆数据库设置密码并授权2.1.4 配置MariaDB Galeria 集群 (控制节点全修改配置文件/etc/my.cnf.d/server.cnf)2.1.5 控制节点停止mariadb.service2.1.6 启动初始化集群2.2 登录数据库,查看集群状态2.3 每个控制节点/usr/bin 目录下面编辑心跳检测clustercheck脚本3、部署xinetd服务3.1 控制节点安装xinetd服务,用于监控Galera服务(cont01、cont02、cont03)3.2 控制节点创建xinetd配置文件3.3 控制节点修改/etc/services3.4 控制节点启动xinetd服务(cont01、cont02、cont03)3.5 故障场景的恢复4、部署rabbitmq消息队列4.1、所有控制节点安装rabbitmq和erlang软件包(cont01、cont02、cont03、~~mon01~~)4.2 查看集群状态4.3 编辑rabbitmq变量文件4.4 查看rabbitmq插件4.5 所有node开启rabbitmq的web管理页面4.6 查看rabbitmq插件是否加载成功(成功后前面有E)4.7 访问:http://192.168.10.22:15672/4.8 所有node修改guest密码为adminrabbitmq(默认用户为:guest 密码为:guest)4.9 cont02添加一个openstack的用户,并设密码为adminopenstack。并设置权限成为管理员(成集群后,cont02和cont03上也可以登录)4.10 发送cont01上的erlang.cookie到其他节点配置集群4.11 cont01、cont03、mon01停止应用,并ram的方式加入cont01节点,之后重启应用(先查看cont01集群名字)4.12 检测集群状态4.13 为了保持rabbitmq的消息一致性,可以设置消息镜像模式策略 所有节点执行4.14 RabbitMQ注意小计(故障恢复提示):5、部署PCS5.1 安装pcs+pacemaker+corosync (cont01、cont02和 cont03)5.2 设置集群密码,而且三个节点密码需一直为:pcs#typora5.3 控制节点创建配置文件corosync.conf5.4 配置集群,设置集群互相认证5.5 配置节点认证5.6 创建集群5.7 启动集群并查看集群状态5.8 检验corosync状态5.9 创建集群VIP (预设VIP地址为192.168.10.20)6、部署Haproxy6.1 所有控制节点安装Haproxy,并保持相同的配置,对于配置文件,建议根据实际情况进行优化。6.1.1安装并配置haproxy6.2 配置Haproxy能监控Galera数据库集群6.3 开启haproxy,并自启动(cont01、cont02、cont03)6.4 登陆http://192.168.10.20:8789/ 检测状态 默认账户:admin 密码:admin7、部署Keystone7.1 控制节点安装Enable the OpenStack repository(cont01和cont02)7.2 控制节点安装 SQL database(cont01和cont02)7.3 安装Memcached (cont01、cont02和cont03)7.4 configure the Identity service7.5 Install and configure components7.6 Configure the Apache HTTP server [cont0$表示cont01、cont02和cont03]7.7 Create a domain, projects, users, and roles 1、前期准备 1.1 软硬件准备

控制主机三台(三网卡)+计算节点三台(双网卡、兼分布式存储OSD)+存储节点一台(双网卡部署MOD)

1.1.1组件
组件 版本 备注
CentOS CentOS Linux release 7.7.1908 (Core)
Openstack Rocky
Ceph ceph version 12.2.12 luminous (stable)
1.1.2 网络拓扑(逻辑)

在这里插入图片描述

1.控制节点(cont01、cont02、cont03)运行MariaDB galeria,Haproxy,pacemaker,corosync,keystone, glance,horizon,nova&neutron&cinder管理相关组件和openstack相关的基础服务

2.计算节点(comp01、comp02、comp03)运行Nova-compute,neutron-openswitch-agent,ceph-osd等

3.存储控制节点(mon01)运行ceph-mon,ceph-mgr(如生产环境,建议也采用三个节点部署)

4.控制节点网络:

​ 管理网络–>eth0:192.168.10.0/24 用于对openstack集群host os,api等网络管理

​ 租户隧道VM网络–>eth1:10.10.10.0/24 用于集群内部的guest os之间的数据交换,采用vxlan/gre等方式

​ 外部External网络–>eth2:主要用户计算节点访问internet/外部的floating ip

ps:此处的管理网络合并Management网络+API网络

5.计算节点网络:

​ 管理网络–>eth0:192.168.10.0/24 用于对openstack集群host os,api等网络管理,

​ 租户隧道网络–>eth1:10.10.10.0/24 用于集群内部的guest os之间的数据交换,采用vxlan/gre等方式

6.前端采用Haproxy做高可用,数据库采用MariaDB galeria,存储采用ceph分布式存储集群

7.无状态的服务,如xxx-api,采取active/active的模式运行;有状态的服务,如neturon-xxx-agent,cinder-volume等,建议采取active/passive的模式运行(因前端采用haproxy,客户端的多次请求可能会被转发到不同的控制节点,如果客户端请求被负载到无状态信息的控制节点,可能会导致操作请求失败);自身具有集群机制的服务,如rabbitmq,memcached等采用本身的集群机制。

1.1.3 物理拓扑(部分参考)

在这里插入图片描述

1.2 配置hosts文件(集群所有电脑配置) [root@cont02:/root]# vim /etc/hosts 192.168.10.21 cont01 192.168.10.22 cont02 192.168.10.23 cont03 192.168.10.20 VirtualIP 192.168.10.24 mon01 192.168.10.19 comp01 192.168.10.18 comp02 192.168.10.17 comp03 [root@cont02:/root]# scp /etc/hosts cont02:/etc/ [root@cont02:/root]# scp /etc/hosts cont03:/etc/ [root@cont02:/root]# scp /etc/hosts mon01:/etc/ [root@cont02:/root]# scp /etc/hosts comp01:/etc/ [root@cont02:/root]# scp /etc/hosts comp02:/etc/ [root@cont02:/root]# scp /etc/hosts comp03:/etc/ 1.3 节点间设置免密互相登陆 [root@cont01:/root]# ssh-keygen ssh-copy-id cont02 ssh-copy-id cont03 ssh-copy-id mon01 ssh-copy-id comp01 ssh-copy-id comp02 ssh-copy-id comp03 [root@cont02:/root]# ssh-keygen ssh-copy-id cont01 ssh-copy-id cont03 ssh-copy-id mon01 ssh-copy-id comp01 ssh-copy-id comp02 ssh-copy-id comp03 [root@cont03:/root]# ssh-keygen ssh-copy-id cont01 ssh-copy-id cont02 ssh-copy-id mon01 ssh-copy-id comp01 ssh-copy-id comp02 ssh-copy-id comp03 1.4 配置NTP服务 1.4.1 在控制节点进行配置(cont01、cont02、cont03) [root@cont02:/root]# yum install system-config-date -y [root@cont02:/root]# vim /etc/chrony.conf 3 server 0.centos.pool.ntp.org iburst 4 server 192.168.10.21 iburst 5 server 192.168.10.22 iburst 6 server 192.168.10.23 iburst 25 # Allow NTP client access from local network. 26 allow 192.168.0.0/16 [root@cont02:/root]# scp /etc/chrony.conf cont01:/etc/chrony.conf [root@cont02:/root]# scp /etc/chrony.conf cont03:/etc/chrony.conf [root@cont01:/root]# systemctl enable chronyd && systemctl restart chronyd [root@cont01:/root]# systemctl status chronyd [root@cont02:/root]# systemctl enable chronyd && systemctl restart chronyd [root@cont02:/root]# systemctl status chronyd [root@cont03:/root]# systemctl enable chronyd && systemctl restart chronyd [root@cont03:/root]# systemctl status chronyd [root@cont01:/root]# chronyc sources 210 Number of sources = 3 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? cont02 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont03 0 7 0 - +0ns[ +0ns] +/- 0ns ^* 119.28.206.193 2 6 37 21 +197us[+1847us] +/- 55ms [root@cont02:/root]# chronyc sources 210 Number of sources = 3 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? cont01 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont03 0 7 0 - +0ns[ +0ns] +/- 0ns ^* 162.159.200.123 3 6 37 51 +2273us[ +143us] +/- 121ms [root@cont03:/root]# chronyc sources 210 Number of sources = 3 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? cont01 0 8 0 - +0ns[ +0ns] +/- 0ns ^? cont02 0 8 0 - +0ns[ +0ns] +/- 0ns ^? tick.ntp.infomaniak.ch 0 8 0 - +0ns[ +0ns] +/- 0ns [root@cont02:/root]# ssh cont01 'date'; ssh cont03 'date' ; date Tue Feb 11 20:32:16 CST 2020 Tue Feb 11 20:32:16 CST 2020 Tue Feb 11 20:32:16 CST 2020 1.4.2 在其他节点上安装NTP软件(mon01、comp01、comp02、comp03) [root@mon01:/root]# yum install chrony -y [root@comp01:/root]# yum install chrony -y [root@comp02:/root]# yum install chrony -y [root@comp03:/root]# yum install chrony -y [root@mon01:/root]# vim /etc/chrony.conf 3 server 192.168.10.22 iburst 4 server 192.168.10.21 iburst 5 server 192.168.10.23 iburst 6 #server 3.centos.pool.ntp.org iburst [root@comp01:/root]# vim /etc/chrony.conf 3 server 192.168.10.22 iburst 4 server 192.168.10.21 iburst 5 server 192.168.10.23 iburst 6 #server 3.centos.pool.ntp.org iburst [root@comp02:/root]# vim /etc/chrony.conf 3 server 192.168.10.22 iburst 4 server 192.168.10.21 iburst 5 server 192.168.10.23 iburst 6 #server 3.centos.pool.ntp.org iburst [root@comp03:/root]# vim /etc/chrony.conf 3 server 192.168.10.22 iburst 4 server 192.168.10.21 iburst 5 server 192.168.10.23 iburst 6 #server 3.centos.pool.ntp.org iburst [root@mon01:/root]# systemctl enable chronyd && systemctl restart chronyd [root@mon01:/root]# systemctl status chronyd [root@comp01:/root]# systemctl enable chronyd && systemctl restart chronyd [root@comp01:/root]# systemctl status chronyd [root@comp02:/root]# systemctl enable chronyd && systemctl restart chronyd [root@comp02:/root]# systemctl status chronyd [root@comp03:/root]# systemctl enable chronyd && systemctl restart chronyd [root@comp03:/root]# systemctl status chronyd [root@mon01:/root]# chronyc sources 210 Number of sources = 3 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? cont02 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont01 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont03 0 7 0 - +0ns[ +0ns] +/- 0ns [root@mon01:/root]# timedatectl Local time: Wed 2020-01-15 11:22:23 CST Universal time: Wed 2020-01-15 03:22:23 UTC RTC time: Wed 2020-01-15 03:22:24 Time zone: Asia/Shanghai (CST, +0800) NTP enabled: yes NTP synchronized: no RTC in local TZ: no DST active: n/a [root@comp01:/root]# chronyc sources 210 Number of sources = 3 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? cont02 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont01 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont03 0 7 0 - +0ns[ +0ns] +/- 0ns [root@comp01:/root]# timedatectl Local time: Wed 2020-01-15 11:22:24 CST Universal time: Wed 2020-01-15 03:22:24 UTC RTC time: Wed 2020-01-15 03:22:24 Time zone: Asia/Shanghai (CST, +0800) NTP enabled: yes NTP synchronized: no RTC in local TZ: no DST active: n/a [root@comp02:/root]# chronyc sources 210 Number of sources = 3 MS Name/IP address Stratum Poll Reach LastRx Last sample =============================================================================== ^? cont02 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont01 0 7 0 - +0ns[ +0ns] +/- 0ns ^? cont03 0 7 0 - +0ns[ +0ns] +/- 0ns [root@comp02:/root]# timedatectl Local time: Wed 2020-01-15 11:22:24 CST Universal time: Wed 2020-01-15 03:22:24 UTC RTC time: Wed 2020-01-15 03:22:24 Time zone: Asia/Shanghai (CST, +0800) NTP enabled: yes NTP synchronized: no RTC in local TZ: no DST active: n/a 1.5 系统初始设置 1.5.1 所有节点关闭防火墙和selinux systemctl stop firewalld && systemctl disable firewalld setenforce 0 sed -i 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config

1.5.2 所有节点编辑limits.conf和sysctl.conf文件

cat >>/etc/security/limits.conf<>/etc/sysctl.conf<<eof fs.file-max=655350 net.ipv4.ip_local_port_range = 1025 65000 net.ipv4.tcp_tw_recycle = 1 net.ipv4.ip_nonlocal_bind = 1 eof sysctl -p 2、部署MariaDB Galeria集群 2.1 控制节点安装mariadb(cont01、cont02、cont03)

Galera是Galera Cluster,是一种为数据库设计的新型的、数据不共享的、高度冗余的高可用方案,galera mariadb就是集成了Galera插件的MariaDB集群,Galera本身是具有多主特性的,所以galera mariadb不是传统的主备模式的集群,而是多主节点架构。

2.1.1 控制节点添加mariadb-10.4.11的yum源 [root@cont01:/root]# vim /etc/yum.repos.d/mariadb.repo [mariadb] name = MariaDB baseurl = http://mirror.netinch.com/pub/mariadb//mariadb-10.4.11/yum/centos7-amd64/ gpgcheck = 0 enabled = 1 [root@cont01:/root]# yum makecache [root@cont02:/root]# cat > /etc/yum.repos.d/mariadb.repo << EOF [mariadb] name = MariaDB baseurl = http://mirror.netinch.com/pub/mariadb//mariadb-10.4.11/yum/centos7-amd64/ gpgcheck = 0 enabled = 1 EOF [root@cont02:/root]# yum makecache [root@cont02:/root]# scp /etc/yum.repos.d/mariadb.repo cont03:/etc/yum.repos.d/ [root@cont03:/root]# yum makecache 2.1.2 控制节点yum安装mariadb软件包并开启服务 [root@cont01:/root]# yum -y install MariaDB-server MariaDB-client galera-4 rsync xinetd wget ntpdate [root@cont01:/root]# systemctl enable mariadb && systemctl start mariadb [root@cont01:/root]# systemctl status mariadb [root@cont02:/root]# yum -y install MariaDB-server MariaDB-client galera-4 rsync xinetd wget ntpdate [root@cont02:/root]# systemctl enable mariadb && systemctl start mariadb [root@cont02:/root]# systemctl status mariadb [root@cont03:/root]# yum -y install MariaDB-server MariaDB-client galera-4 rsync xinetd wget ntpdate [root@cont03:/root]# systemctl enable mariadb && systemctl start mariadb [root@cont03:/root]# systemctl status mariadb 2.1.3 控制节点数据库安全设置、登陆数据库设置密码并授权 [root@cont02:/root]# mysql_secure_installation 注:Disallow root login remotely? [Y/n] n 密码:typora#2019 [root@cont02:/root]# mysql MariaDB [(none)]> SET password=PASSWORD('typora#2019'); //密码设置过的可以省略 Query OK, 0 rows affected (0.002 sec) MariaDB [(none)]> grant all privileges on *.* to 'root'@'%' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.002 sec) MariaDB [(none)]> grant all privileges on *.* to 'mysql'@'%' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> grant all privileges on *.* to 'root'@'localhost' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.002 sec) MariaDB [(none)]> grant all privileges on *.* to 'mysql'@'localhost' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> flush privileges; Query OK, 0 rows affected (0.001 sec) MariaDB [(none)]> exit Bye [root@cont01:/root]# mysql_secure_installation 注:Disallow root login remotely? [Y/n] n 密码:typora#2019 [root@cont01:/root]# mysql MariaDB [(none)]> SET password=PASSWORD('typora#2019'); //密码设置过的可以省略 Query OK, 0 rows affected (0.012 sec) MariaDB [(none)]> grant all privileges on *.* to 'root'@'%' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> grant all privileges on *.* to 'mysql'@'%' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> grant all privileges on *.* to 'root'@'localhost' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.002 sec) MariaDB [(none)]> grant all privileges on *.* to 'mysql'@'localhost' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> flush privileges; Query OK, 0 rows affected (0.001 sec) MariaDB [(none)]> exit Bye [root@cont03:/root]# mysql_secure_installation 注:Disallow root login remotely? [Y/n] n 密码:typora#2019 [root@cont03:/root]# mysql MariaDB [(none)]> SET password=PASSWORD('typora#2019'); //密码设置过的可以省略 Query OK, 0 rows affected (0.009 sec) MariaDB [(none)]> grant all privileges on *.* to 'root'@'localhost' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.002 sec) MariaDB [(none)]> grant all privileges on *.* to 'mysql'@'localhost' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> grant all privileges on *.* to 'root'@'%' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.016 sec) MariaDB [(none)]> grant all privileges on *.* to 'mysql'@'%' identified by 'typora#2019' with grant option; Query OK, 0 rows affected (0.008 sec) MariaDB [(none)]> flush privileges; Query OK, 0 rows affected (0.001 sec) MariaDB [(none)]> exit Bye 2.1.4 配置MariaDB Galeria 集群 (控制节点全修改配置文件/etc/my.cnf.d/server.cnf) [root@cont02:/root]# vim /etc/my.cnf.d/server.cnf [server] # this is only for the mysqld standalone daemon [mysqld] # # * Galera-related settings # [galera] # Mandatory settings ##wsrep_on=ON #开启全同步复制模式 wsrep_on=ON wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so wsrep_cluster_address="gcomm://192.168.10.22,192.168.10.23,192.168.10.21" binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 ##innodb_autoinc_lock_mode=2 #主键自增模式修改为交叉模式 ##wsrep_slave_threads=8 #开启并行复制线程,根据CPU核数设置 # # Allow server to accept connections on all interfaces. # bind-address=192.168.10.22 # # Optional setting #wsrep_slave_threads=1 #innodb_flush_log_at_trx_commit=0 #事务提交每隔1秒刷盘 ##wsrep_certify_nonPK=ON #为没有显式申明主键的表生成一个用于certificationtest的主键,默认为ON ##query_cache_size=0 #关闭查询缓存 ##wsrep_provider_options="gcache.size=4G" #同步复制缓冲池 wsrep_provider_options="gcache.size=1G" wsrep_cluster_name=MariaDB-Galera-Cluster wsrep_node_name=cont02 wsrep_node_address=192.168.10.22 wsrep_sst_method=rsync ##innodb_buffer_pool_size=2G # this is only for embedded server [embedded] # This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # This group is only read by MariaDB-10.4 servers. # If you use the same .cnf file for MariaDB of different versions, # use this group for options that older servers don't understand [mariadb-10.4] [root@cont01:/root]# vim /etc/my.cnf.d/server.cnf [server] # this is only for the mysqld standalone daemon [mysqld] # # * Galera-related settings # [galera] # Mandatory settings wsrep_on=ON wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so wsrep_cluster_address="gcomm://192.168.10.21,192.168.10.22,192.168.10.23" binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 # # Allow server to accept connections on all interfaces. # bind-address=192.168.10.21 # # Optional setting #wsrep_slave_threads=1 #innodb_flush_log_at_trx_commit=0 wsrep_provider_options="gcache.size=1G" wsrep_cluster_name=MariaDB-Galera-Cluster wsrep_node_name=cont01 wsrep_node_address=192.168.10.21 wsrep_sst_method=rsync # this is only for embedded server [embedded] # This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # This group is only read by MariaDB-10.4 servers. # If you use the same .cnf file for MariaDB of different versions, # use this group for options that older servers don't understand [mariadb-10.4] [root@cont03:/root]# vim /etc/my.cnf.d/server.cnf [server] # this is only for the mysqld standalone daemon [mysqld] # # * Galera-related settings # [galera] # Mandatory settings wsrep_on=ON wsrep_provider=/usr/lib64/galera-4/libgalera_smm.so wsrep_cluster_address="gcomm://192.168.10.23,192.168.10.22,192.168.10.21" binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 # # Allow server to accept connections on all interfaces. # bind-address=192.168.10.23 # # Optional setting #wsrep_slave_threads=1 #innodb_flush_log_at_trx_commit=0 wsrep_provider_options="gcache.size=1G" wsrep_cluster_name=MariaDB-Galera-Cluster wsrep_node_name=cont03 wsrep_node_address=192.168.10.23 wsrep_sst_method=rsync # this is only for embedded server [embedded] # This group is only read by MariaDB servers, not by MySQL. # If you use the same .cnf file for MySQL and MariaDB, # you can put MariaDB-only options here [mariadb] # This group is only read by MariaDB-10.4 servers. # If you use the same .cnf file for MariaDB of different versions, # use this group for options that older servers don't understand [mariadb-10.4] 2.1.5 控制节点停止mariadb.service [root@cont01:/root]# systemctl daemon-reload [root@cont01:/root]# systemctl stop mariadb.service [root@cont02:/root]# systemctl daemon-reload [root@cont02:/root]# systemctl stop mariadb.service [root@cont03:/root]# systemctl daemon-reload [root@cont03:/root]# systemctl stop mariadb.service 2.1.6 启动初始化集群 [root@cont02:/root]# mysqld_safe --defaults-file=/etc/my.cnf.d/server.cnf --user=mysql --wsrep-new-cluster & [root@cont03:/root]# systemctl start mariadb [root@cont03:/root]# systemctl status mariadb [root@cont03:/root]# systemctl enable mariadb [root@cont01:/root]# systemctl start mariadb [root@cont01:/root]# systemctl status mariadb [root@cont01:/root]# systemctl enable mariadb 注:两个节点启动成功后,在重新启动第一个节点:(删除第一个节点数据) [root@cont02:/root]# pkill -9 mysql [1]+ Killed mysqld_safe --defaults-file=/etc/my.cnf.d/server.cnf --user=mysql --wsrep-new-cluster [root@cont02:/root]# rm -rf /var/lib/mysql/* [root@cont02:/root]# systemctl start mariadb.service [root@cont02:/root]# systemctl enable mariadb.service Created symlink from /etc/systemd/system/mysql.service to /usr/lib/systemd/system/mariadb.service. Created symlink from /etc/systemd/system/mysqld.service to /usr/lib/systemd/system/mariadb.service. Created symlink from /etc/systemd/system/multi-user.target.wants/mariadb.service to /usr/lib/systemd/system/mariadb.service. [root@cont02:/root]# systemctl status mariadb.service 注意:每次在第一台集群上,需要使用下面的命令来启动: galera_new_cluster 2.2 登录数据库,查看集群状态 [root@cont02:/root]# mysql -uroot -p"typora#2019" Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 13 Server version: 10.4.11-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | +--------------------+ 3 rows in set (0.002 sec) MariaDB [(none)]> show status like 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.002 sec) MariaDB [(none)]> show status like 'wsrep%'; +-------------------------------+---------------------------------------------------- | Variable_name | Value +-------------------------------+---------------------------------------------------- | wsrep_local_state_uuid | a2b7231d-383a-11ea-b086-83d5778583b5 | wsrep_protocol_version | 10 | wsrep_last_committed | 5 | wsrep_replicated | 0 | wsrep_replicated_bytes | 0 | wsrep_repl_keys | 0 | wsrep_repl_keys_bytes | 0 | wsrep_repl_data_bytes | 0 | wsrep_repl_other_bytes | 0 | wsrep_received | 3 | wsrep_received_bytes | 256 | wsrep_local_commits | 0 | wsrep_local_cert_failures | 0 | wsrep_local_replays | 0 | wsrep_local_send_queue | 0 | wsrep_local_send_queue_max | 1 | wsrep_local_send_queue_min | 0 | wsrep_local_send_queue_avg | 0 | wsrep_local_recv_queue | 0 | wsrep_local_recv_queue_max | 1 | wsrep_local_recv_queue_min | 0 | wsrep_local_recv_queue_avg | 0 | wsrep_local_cached_downto | 5 | wsrep_flow_control_paused_ns | 0 | wsrep_flow_control_paused | 0 | wsrep_flow_control_sent | 0 | wsrep_flow_control_recv | 0 | wsrep_cert_deps_distance | 0 | wsrep_apply_oooe | 0 | wsrep_apply_oool | 0 | wsrep_apply_window | 0 | wsrep_commit_oooe | 0 | wsrep_commit_oool | 0 | wsrep_commit_window | 0 | wsrep_local_state | 4 | wsrep_local_state_comment | Synced | wsrep_cert_index_size | 0 | wsrep_causal_reads | 0 | wsrep_cert_interval | 0 | wsrep_open_transactions | 0 | wsrep_open_connections | 0 | wsrep_incoming_addresses | AUTO,AUTO,AUTO | wsrep_cluster_weight | 3 | wsrep_desync_count | 0 | wsrep_evs_delayed | | wsrep_evs_evict_list | | wsrep_evs_repl_latency | 0/0/0/0/0 | wsrep_evs_state | OPERATIONAL | wsrep_gcomm_uuid | d5a23bb7-383a-11ea-b735-86c7a20615b4 | wsrep_applier_thread_count | 1 | wsrep_cluster_capabilities | | wsrep_cluster_conf_id | 18446744073709551615 | wsrep_cluster_size | 3 | wsrep_cluster_state_uuid | a2b7231d-383a-11ea-b086-83d5778583b5 | wsrep_cluster_status | Primary | wsrep_connected | ON | wsrep_local_bf_aborts | 0 | wsrep_local_index | 0 | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | wsrep_provider_vendor | Codership Oy | wsrep_provider_version | 26.4.3(r4535) | wsrep_ready | ON | wsrep_rollbacker_thread_count | 1 | wsrep_thread_count | 2 +-------------------------------+--------------------------------------------- 65 rows in set (0.002 sec) MariaDB [(none)]> exit Bye [root@cont01:/var/lib/mysql]# cat grastate.dat gvwstate.dat # GALERA saved state version: 2.1 uuid: a2b7231d-383a-11ea-b086-83d5778583b5 seqno: -1 safe_to_bootstrap: 0 my_uuid: b3a02e5e-383a-11ea-ab01-07face985b92 #vwbeg view_id: 3 ac080e34-383a-11ea-9d9d-22f77c5f5aa6 5 bootstrap: 0 member: ac080e34-383a-11ea-9d9d-22f77c5f5aa6 0 member: b3a02e5e-383a-11ea-ab01-07face985b92 0 member: d5a23bb7-383a-11ea-b735-86c7a20615b4 0 #vwend [root@cont02:/var/lib/mysql]# cat grastate.dat gvwstate.dat # GALERA saved state version: 2.1 uuid: a2b7231d-383a-11ea-b086-83d5778583b5 seqno: -1 safe_to_bootstrap: 0 my_uuid: d5a23bb7-383a-11ea-b735-86c7a20615b4 #vwbeg view_id: 3 ac080e34-383a-11ea-9d9d-22f77c5f5aa6 5 bootstrap: 0 member: ac080e34-383a-11ea-9d9d-22f77c5f5aa6 0 member: b3a02e5e-383a-11ea-ab01-07face985b92 0 member: d5a23bb7-383a-11ea-b735-86c7a20615b4 0 #vwend [root@cont03:/var/lib/mysql]# cat grastate.dat gvwstate.dat # GALERA saved state version: 2.1 uuid: a2b7231d-383a-11ea-b086-83d5778583b5 seqno: -1 safe_to_bootstrap: 0 my_uuid: ac080e34-383a-11ea-9d9d-22f77c5f5aa6 #vwbeg view_id: 3 ac080e34-383a-11ea-9d9d-22f77c5f5aa6 5 bootstrap: 0 member: ac080e34-383a-11ea-9d9d-22f77c5f5aa6 0 member: b3a02e5e-383a-11ea-ab01-07face985b92 0 member: d5a23bb7-383a-11ea-b735-86c7a20615b4 0 #vwend 2.3 每个控制节点/usr/bin 目录下面编辑心跳检测clustercheck脚本 [root@cont01:/root]# cd /usr/bin/ [root@cont01:/usr/bin]# wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck [root@cont01:/usr/bin]# chmod +x /usr/bin/clustercheck [root@cont01:/usr/bin]# ls -ld /usr/bin/clustercheck -rwxr-xr-x. 1 root root 3882 Jan 15 19:19 /usr/bin/clustercheck [root@cont02:/root]# cd /usr/bin/ [root@cont02:/usr/bin]# wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck [root@cont02:/usr/bin]# chmod +x /usr/bin/clustercheck [root@cont02:/usr/bin]# ls -ld /usr/bin/clustercheck -rwxr-xr-x. 1 root root 3882 Jan 15 19:18 /usr/bin/clustercheck [root@cont03:/root]# cd /usr/bin/ [root@cont03:/usr/bin]# wget https://raw.githubusercontent.com/olafz/percona-clustercheck/master/clustercheck [root@cont03:/usr/bin]# chmod +x /usr/bin/clustercheck [root@cont03:/usr/bin]# ls -ld /usr/bin/clustercheck -rwxr-xr-x. 1 root root 3882 Jan 15 19:20 /usr/bin/clustercheck 3、部署xinetd服务 3.1 控制节点安装xinetd服务,用于监控Galera服务(cont01、cont02、cont03) yum install xinetd -y 3.2 控制节点创建xinetd配置文件 [root@cont01:/root]# cat </etc/xinetd.d/mysqlchk #default:on ##description:mysqlchk service mysqlchk { disable = no flags = REUSE socket_type = stream port = 9200 wait = no user = nobody server = /usr/bin/clustercheck log_on_failure += USERID only_from = 0.0.0.0/0 per_source = UNLIMITED } END [root@cont01:/root]# scp /etc/xinetd.d/mysqlchk cont02:/etc/xinetd.d/ [root@cont01:/root]# scp /etc/xinetd.d/mysqlchk cont03:/etc/xinetd.d/ 3.3 控制节点修改/etc/services

echo ‘mysqlchk 9200/tcp #mysqlchk’>>/etc/services

[root@cont02:/root]# vim /etc/services 转到第10101行,注释使用端口9200的服务,并为mysqlchk / clustercheck添加一行。 mysqlchk 9200/tcp # mysqlchk #wap-wsp 9200/tcp # WAP connectionless session service #wap-wsp 9200/udp # WAP connectionless session service [root@cont01:/root]# vim /etc/services 转到第10101行,注释使用端口9200的服务,并为mysqlchk / clustercheck添加一行。 mysqlchk 9200/tcp # mysqlchk #wap-wsp 9200/tcp # WAP connectionless session service #wap-wsp 9200/udp # WAP connectionless session service [root@cont03:/root]# vim /etc/services 转到第10101行,注释使用端口9200的服务,并为mysqlchk / clustercheck添加一行。 mysqlchk 9200/tcp # mysqlchk #wap-wsp 9200/tcp # WAP connectionless session service #wap-wsp 9200/udp # WAP connectionless session service 3.4 控制节点启动xinetd服务(cont01、cont02、cont03) systemctl enable xinetd && systemctl start xinetd systemctl status xinetd 3.5 故障场景的恢复

①首先判断当前数据库集群中是否有服务在启动着,如果有则直接启动服务即可,其中1个节点挂了,一般只需要重启A节点的服务即可。

② 如果当前所有节点的数据库服务都挂了,则需要找状态最新的那个节点让它携带–wsrep-new-cluster参数启动,启动起来之后其它节点直接启动服务即可。
查找最新节点策略:首先获取各节点的grastate.dat文件中的seqno值,值最大的那个就是最新的节点;如果所有节点的seqno都是-1,则去比较所有节点的gvwstate.dat文件中的my_uuid和view_id是否相等,相等的那个则作为第一个启动节点,第一个启动节点启动后,其它节点正常启动即可;如果依然未找到则需要人工干预来恢复了。

//启动 mysqld_safe --defaults-file=/etc/my.cnf.d/server.cnf --user=mysql --wsrep-new-cluster &

③如果以上两种方法全部无效。尝试采用下面操作(进过测试,此方法成功过)

第一步:数据库无法启动情况下把/var/lib/mysql下的几个文件:grastate.dat gvwstate.dat galera.cache mysql.sock统统删掉再重启

第二步:更改配置文件中的 “gcomm://” 是特殊的地址,仅仅是galera cluster初始化启动时候使用。如果集群启动以后,我们关闭了第一个节点,那么再次启动的时候必须先修改"gcomm://"为其他节点的集群地址,首次作为发起人则此处为空,不向任何人同步数据。

第三步:本机启动服务:systemctl start mariadb

第四部:启动集群:galera_new_cluster

第五步:启动其余节点上的mariadb: systemctl start mariadb

第六步:更改回 “gcomm://” 重启数据库systemctl restart mariadb

监测状态说明 集群完整性检查: wsrep_cluster_state_uuid:在集群所有节点的值应该是相同的,有不同值的节点,说明其没有连接入集群. wsrep_cluster_conf_id:正常情况下所有节点上该值是一样的.如果值不同,说明该节点被临时”分区”了.当节点之间网络连接恢复的时候应该会恢复一样的值. wsrep_cluster_size:如果这个值跟预期的节点数一致,则所有的集群节点已经连接. wsrep_cluster_status:集群组成的状态.如果不为”Primary”,说明出现”分区”或是”split-brain”状况. 节点状态检查: wsrep_ready: 该值为ON,则说明可以接受SQL负载.如果为Off,则需要检查wsrep_connected. wsrep_connected: 如果该值为Off,且wsrep_ready的值也为Off,则说明该节点没有连接到集群.(可能是wsrep_cluster_address或wsrep_cluster_name等配置错造成的.具体错误需要查看错误日志) wsrep_local_state_comment:如果wsrep_connected为On,但wsrep_ready为OFF,则可以从该项查看原因. 复制健康检查: wsrep_flow_control_paused:表示复制停止了多长时间.即表明集群因为Slave延迟而慢的程度.值为0~1,越靠近0越好,值为1表示复制完全停止.可优化wsrep_slave_threads的值来改善. wsrep_cert_deps_distance:有多少事务可以并行应用处理.wsrep_slave_threads设置的值不应该高出该值太多. wsrep_flow_control_sent:表示该节点已经停止复制了多少次. wsrep_local_recv_queue_avg:表示slave事务队列的平均长度.slave瓶颈的预兆. 当前节点数据库状态 MariaDB [(none)]> show status like 'wsrep_local_state_comment'; +---------------------------+--------+ | Variable_name | Value | +---------------------------+--------+ | wsrep_local_state_comment | Synced | +---------------------------+--------+ MariaDB [(none)]> show status like 'wsrep_cluster_size'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.001 sec) MariaDB [(none)]> show status like 'wsrep_cluster_state_uuid'; MariaDB [(none)]> show status like 'wsrep_cluster_status'; 状态查询表: Open :节点启动成功,尝试连接到集群 Primary :节点已处于集群中,在新节点加入时,选取donor进行数据库同步时会产生的状态 Joiner :节点处于等待接收或正在接收同步文件的状态 Joined :节点完成数据同步,但还有部分数据不是最新的,在追赶与集群数据一致的状态 Synced :节点正常提供服务的状态,表示当前节点数据状态与集群数据状态是一致的 Donor :表示该节点被选为Donor节点,正在为新加进来的节点进行全量数据同步,此时该节点对客户端不提供服务 4、部署rabbitmq消息队列 4.1、所有控制节点安装rabbitmq和erlang软件包(cont01、cont02、cont03、mon01 //节点:cont01 [root@cont01:/root]# yum install epel-release -y [root@cont01:/root]# yum install -y erlang rabbitmq-server.noarch [root@cont01:/root]# systemctl enable rabbitmq-server && systemctl start rabbitmq-server [root@cont01:/root]# systemctl status rabbitmq-server [root@cont01:/root]# netstat -ntlpa | grep 5672 //节点:cont02 [root@cont02:/root]# yum install epel-release -y [root@cont02:/root]# yum install -y erlang rabbitmq-server.noarch [root@cont02:/root]# systemctl enable rabbitmq-server && systemctl start rabbitmq-server [root@cont02:/root]# systemctl status rabbitmq-server [root@cont02:/root]# netstat -ntlpa | grep 5672 //节点:cont03 [root@cont03:/root]# yum install epel-release -y [root@cont03:/root]# yum install -y erlang rabbitmq-server.noarch [root@cont03:/root]# systemctl enable rabbitmq-server && systemctl start rabbitmq-server [root@cont03:/root]# systemctl status rabbitmq-server [root@cont03:/root]# netstat -ntlpa | grep 5672 //节点:mon01 [root@mon01:/root]# yum install epel-release -y [root@mon01:/root]# yum install -y erlang rabbitmq-server.noarch [root@mon01:/root]# systemctl enable rabbitmq-server && systemctl start rabbitmq-server [root@mon01:/root]# systemctl status rabbitmq-server [root@mon01:/root]# netstat -ntlpa | grep 5672 4.2 查看集群状态

(提示:Rabbitmq的配置文件/etc/rabbitmq/rabbitmq.config)

[root@cont01:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont01 ... [{nodes,[{disc,[rabbit@cont01]}]}, {running_nodes,[rabbit@cont01]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@cont02:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont02 ... [{nodes,[{disc,[rabbit@cont02]}]}, {running_nodes,[rabbit@cont02]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@cont03:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont03 ... [{nodes,[{disc,[rabbit@cont03]}]}, {running_nodes,[rabbit@cont03]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@mon01:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@mon01 ... [{nodes,[{disc,[rabbit@mon01]}]}, {running_nodes,[rabbit@mon01]}, {cluster_name,<>}, {partitions,[]}] ...done. //*注:partitions里如果有节点一般非正常,可以通过重启rabbitmq-server服务来恢复*// 4.3 编辑rabbitmq变量文件 [root@cont01:/root]# vim /etc/rabbitmq/rabbitmq-env.conf ## RABBITMQ_NODE_PORT=5672 ulimit -S -n 4096 RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<>}] -kernel inet_default_listen_options [{raw,6,18,<>}]" RABBITMQ_NODE_IP_ADDRESS=192.168.10.21 [root@cont02:/root]# vim /etc/rabbitmq/rabbitmq-env.conf ## RABBITMQ_NODE_PORT=5672 ulimit -S -n 4096 RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<>}] -kernel inet_default_listen_options [{raw,6,18,<>}]" RABBITMQ_NODE_IP_ADDRESS=192.168.10.22 [root@cont03:/root]# vim /etc/rabbitmq/rabbitmq-env.conf ## RABBITMQ_NODE_PORT=5672 ulimit -S -n 4096 RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<>}] -kernel inet_default_listen_options [{raw,6,18,<>}]" RABBITMQ_NODE_IP_ADDRESS=192.168.10.23 [root@mon01:/root]# vim /etc/rabbitmq/rabbitmq-env.conf ## RABBITMQ_NODE_PORT=5672 ulimit -S -n 4096 RABBITMQ_SERVER_ERL_ARGS="+K true +A30 +P 1048576 -kernel inet_default_connect_options [{nodelay,true},{raw,6,18,<>}] -kernel inet_default_listen_options [{raw,6,18,<>}]" RABBITMQ_NODE_IP_ADDRESS=192.168.10.24 4.4 查看rabbitmq插件 [root@cont01:/root]# /usr/lib/rabbitmq/bin/rabbitmq-plugins list [ ] amqp_client 3.3.5 [ ] cowboy 0.5.0-rmq3.3.5-git4b93c2d [ ] eldap 3.3.5-gite309de4 [ ] mochiweb 2.7.0-rmq3.3.5-git680dba8 [ ] rabbitmq_amqp1_0 3.3.5 [ ] rabbitmq_auth_backend_ldap 3.3.5 [ ] rabbitmq_auth_mechanism_ssl 3.3.5 [ ] rabbitmq_consistent_hash_exchange 3.3.5 [ ] rabbitmq_federation 3.3.5 [ ] rabbitmq_federation_management 3.3.5 [ ] rabbitmq_management 3.3.5 [ ] rabbitmq_management_agent 3.3.5 [ ] rabbitmq_management_visualiser 3.3.5 [ ] rabbitmq_mqtt 3.3.5 [ ] rabbitmq_shovel 3.3.5 [ ] rabbitmq_shovel_management 3.3.5 [ ] rabbitmq_stomp 3.3.5 [ ] rabbitmq_test 3.3.5 [ ] rabbitmq_tracing 3.3.5 [ ] rabbitmq_web_dispatch 3.3.5 [ ] rabbitmq_web_stomp 3.3.5 [ ] rabbitmq_web_stomp_examples 3.3.5 [ ] sockjs 0.3.4-rmq3.3.5-git3132eb9 [ ] webmachine 1.10.3-rmq3.3.5-gite9359c7 4.5 所有node开启rabbitmq的web管理页面 rabbitmq-plugins enable rabbitmq_management systemctl restart rabbitmq-server systemctl status rabbitmq-server 4.6 查看rabbitmq插件是否加载成功(成功后前面有E) [root@cont01:/root]# /usr/lib/rabbitmq/bin/rabbitmq-plugins list [e] amqp_client 3.3.5 [ ] cowboy 0.5.0-rmq3.3.5-git4b93c2d [ ] eldap 3.3.5-gite309de4 [e] mochiweb 2.7.0-rmq3.3.5-git680dba8 [ ] rabbitmq_amqp1_0 3.3.5 [ ] rabbitmq_auth_backend_ldap 3.3.5 [ ] rabbitmq_auth_mechanism_ssl 3.3.5 [ ] rabbitmq_consistent_hash_exchange 3.3.5 [ ] rabbitmq_federation 3.3.5 [ ] rabbitmq_federation_management 3.3.5 [E] rabbitmq_management 3.3.5 [e] rabbitmq_management_agent 3.3.5 [ ] rabbitmq_management_visualiser 3.3.5 [ ] rabbitmq_mqtt 3.3.5 [ ] rabbitmq_shovel 3.3.5 [ ] rabbitmq_shovel_management 3.3.5 [ ] rabbitmq_stomp 3.3.5 [ ] rabbitmq_test 3.3.5 [ ] rabbitmq_tracing 3.3.5 [e] rabbitmq_web_dispatch 3.3.5 [ ] rabbitmq_web_stomp 3.3.5 [ ] rabbitmq_web_stomp_examples 3.3.5 [ ] sockjs 0.3.4-rmq3.3.5-git3132eb9 [e] webmachine 1.10.3-rmq3.3.5-gite9359c7

提示:rabbitmq: web默认端口15672 rabbitmq: server默认端口5672

4.7 访问:http://192.168.10.22:15672/

注:如果是阿里云服务器,需要打开15672端口.(后期可能有80端口被占用情况)

在这里插入图片描述

默认账户:guest/guest

4.8 所有node修改guest密码为adminrabbitmq(默认用户为:guest 密码为:guest) [root@cont01:/root]# rabbitmqctl change_password guest adminrabbitmq Changing password for user "guest" ... ...done. [1]+ Done rabbitmqctl change_password guest admin [root@cont02:/root]# rabbitmqctl change_password guest adminrabbitmq Changing password for user "guest" ... ...done. [1]+ Done rabbitmqctl change_password guest admin [root@cont03:/root]# rabbitmqctl change_password guest adminrabbitmq Changing password for user "guest" ... ...done. [root@mon01:/root]# rabbitmqctl change_password guest adminrabbitmq Changing password for user "guest" ... ...done. [1]+ Done rabbitmqctl change_password guest admin 4.9 cont02添加一个openstack的用户,并设密码为adminopenstack。并设置权限成为管理员(成集群后,cont02和cont03上也可以登录) [root@cont02:/root]# rabbitmqctl add_user openstack adminopenstack Creating user "openstack" ... ...done. [root@cont02:/root]# rabbitmqctl set_permissions openstack ".*" ".*" ".*" Setting permissions for user "openstack" in vhost "/" ... ...done. [root@cont02:/root]# rabbitmqctl set_user_tags openstack administrator Setting tags for user "openstack" to [administrator] ... ...done. [root@cont02:/root]# rabbitmqctl list_users Listing users ... guest [administrator] openstack [administrator] ...done. 4.10 发送cont01上的erlang.cookie到其他节点配置集群 [root@cont02:/root]# rabbitmqctl status Status of node rabbit@cont02 ... [{pid,12512}, {running_applications, [{rabbitmq_management,"RabbitMQ Management Console","3.3.5"}, {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.3.5"}, {webmachine,"webmachine","1.10.3-rmq3.3.5-gite9359c7"}, {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.3.5-git680dba8"}, {rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.5"}, {rabbit,"RabbitMQ","3.3.5"}, {os_mon,"CPO CXC 138 46","2.2.14"}, {inets,"INETS CXC 138 49","5.9.8"}, {mnesia,"MNESIA CXC 138 12","4.11"}, {amqp_client,"RabbitMQ AMQP Client","3.3.5"}, {xmerl,"XML parser","1.3.6"}, {sasl,"SASL CXC 138 11","2.3.4"}, {stdlib,"ERTS CXC 138 10","1.19.4"}, {kernel,"ERTS CXC 138 10","2.16.4"}]}, {os,{unix,linux}}, {erlang_version, "Erlang R16B03-1 (erts-5.10.4) [source] [64-bit] [smp:8:8] [async-threads:30] [hipe] [kernel-poll:true]\n"}, {memory, [{total,41896520}, {connection_procs,5600}, {queue_procs,5600}, {plugins,385080}, {other_proc,13866040}, {mnesia,60888}, {mgmt_db,50256}, {msg_index,34528}, {other_ets,1114200}, {binary,74848}, {code,20229052}, {atom,711569}, {other_system,5358859}]}, {alarms,[]}, {listeners,[{clustering,25672,"::"},{amqp,5672,"192.168.10.22"}]}, {vm_memory_high_watermark,0.4}, {vm_memory_limit,8331886592}, {disk_free_limit,50000000}, {disk_free,178624393216}, {file_descriptors, [{total_limit,3996}, {total_used,3}, {sockets_limit,3594}, {sockets_used,1}]}, {processes,[{limit,1048576},{used,190}]}, {run_queue,0}, {uptime,369}] ...done. [root@cont02:/root]# scp /var/lib/rabbitmq/.erlang.cookie cont01:/var/lib/rabbitmq/ [root@cont02:/root]# scp /var/lib/rabbitmq/.erlang.cookie cont03:/var/lib/rabbitmq/ [root@cont02:/root]# scp /var/lib/rabbitmq/.erlang.cookie mon01:/var/lib/rabbitmq/ 4.11 cont01、cont03、mon01停止应用,并ram的方式加入cont01节点,之后重启应用(先查看cont01集群名字) [root@cont02:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont02 ... [{nodes,[{disc,[rabbit@cont02]}]}, {running_nodes,[rabbit@cont02]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@cont01:/root]# systemctl restart rabbitmq-server [root@cont01:/root]# systemctl status rabbitmq-server [root@cont01:/root]# rabbitmqctl stop_app Stopping node rabbit@cont01 ... ...done. [root@cont01:/root]# rabbitmqctl join_cluster --ram rabbit@cont02 Clustering node rabbit@cont01 with rabbit@cont02 ... ...done. [root@cont01:/root]# rabbitmqctl start_app Starting node rabbit@cont01 ... ...done. [root@cont03:/root]# systemctl restart rabbitmq-server [root@cont03:/root]# systemctl status rabbitmq-server [root@cont03:/root]# rabbitmqctl stop_app [root@cont03:/root]# rabbitmqctl join_cluster --ram rabbit@cont02 Clustering node rabbit@cont03 with rabbit@cont02 ... ...done. [root@cont03:/root]# rabbitmqctl start_app Starting node rabbit@cont03 ... ...done. [root@mon01:/root]# systemctl restart rabbitmq-server [root@mon01:/root]# systemctl status rabbitmq-server [root@mon01:/root]# rabbitmqctl stop_app Stopping node rabbit@mon01 ... ...done. [root@mon01:/root]# rabbitmqctl join_cluster --ram rabbit@cont02 Clustering node rabbit@mon01 with rabbit@cont02 ... ...done. [root@mon01:/root]# rabbitmqctl start_app Starting node rabbit@mon01 ... ...done. 4.12 检测集群状态 [root@cont01:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont01 ... [{nodes,[{disc,[rabbit@cont02]}, {ram,[rabbit@mon01,rabbit@cont03,rabbit@cont01]}]}, {running_nodes,[rabbit@mon01,rabbit@cont03,rabbit@cont02,rabbit@cont01]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@cont02:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont02 ... [{nodes,[{disc,[rabbit@cont02]}, {ram,[rabbit@mon01,rabbit@cont03,rabbit@cont01]}]}, {running_nodes,[rabbit@mon01,rabbit@cont03,rabbit@cont01,rabbit@cont02]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@cont03:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@cont03 ... [{nodes,[{disc,[rabbit@cont02]}, {ram,[rabbit@mon01,rabbit@cont03,rabbit@cont01,]}]}, {running_nodes,[rabbit@mon01,rabbit@cont01,rabbit@cont02,rabbit@cont03]}, {cluster_name,<>}, {partitions,[]}] ...done. [root@mon01:/root]# rabbitmqctl cluster_status Cluster status of node rabbit@mon01 ... [{nodes,[{disc,[rabbit@cont02]}, {ram,[rabbit@mon01,rabbit@cont03,rabbit@cont01]}]}, {running_nodes,[rabbit@cont01,rabbit@cont02,rabbit@cont03,rabbit@mon01]}, {cluster_name,<>}, {partitions,[]}] ...done. 4.13 为了保持rabbitmq的消息一致性,可以设置消息镜像模式策略 所有节点执行 rabbitmqctl set_policy ha-all "^" '{"ha-mode":"all"}'

优化RabbitMQQ性能

在RabbitMQQ中,socket descriptors 和 File descriptors的默认配额并不大

修改配置

在其配置文件中添加如下参数,其值请根据实际情况进行调整:

vim /usr/lib/systemd/system/rabbitmq-erver.service

[Service]

LimitNOFILE = 16384

重启rabbitmq

systemctl daemon-reload

systemctl restart rabbitmq-server

//下面为引用参考文案,如下:(节后调整)

在为OpenStack的服务配置使用rabbitmq消息队列服务时,可以如下配置:

transport_url = rabbit://openstack:adminopenstack@cont01,openstack:adminopenstack@cont02,openstack:adminopenstack@cont03

可以看到这里的配置方式是将所有节点的rabbitmq服务以拼接方式拼在一起,当cont1节点上的消息队列服务不可用时可以将请求转发给cont2,再不行可以再转发给cont3节点.

同时应该配置如下参数:

rabbit_retry_interval=1

rabbit_retry_backoff=2

rabbit_max_retries=0

rabbit_durable_queues=true

rabbit_ha_queues=true

4.14 RabbitMQ注意小计(故障恢复提示):

①保证集群中至少有一个磁盘类型的节点以防数据丢失,在更改节点类型时尤其要注意。

②若整个集群被停掉了,应保证最后一个down掉的节点被最先启动,若不能则要使用forget_cluster_node命令将其移出集群。

③若集群中节点几乎同时以不可控的方式down 了,此时再其中一个节点使用force_boot 命令重启节点。

④如果加入集群后,意外关闭等造成rabbitmq-server启动不成功,可以尝试一下步骤:/var/lib/rabbitmq/mnesia 目录下存在rabbit@localhost.pid、rabbit@localhost、rabbit@localhost-plugins-expand,删除这3项后,并且删除 /var/lib/rabbitmq/ 目录下 .erlang.cookie和erl_crash.dump 再使用systemctl start rabbitmq-server启动

5、部署PCS 5.1 安装pcs+pacemaker+corosync (cont01、cont02和 cont03)

所有控制节点安装pcs、pacemaker、corosync, pacemaker是资源管理器,corosync提供心跳机制。

[root@cont01:/root]# yum install -y lvm2 cifs-utils quota psmisc pcs pacemaker corosync fence-agents-all resource-agents crmsh [root@cont02:/root]# yum install -y lvm2 cifs-utils quota psmisc pcs pacemaker corosync fence-agents-all resource-agents crmsh [root@cont03:/root]# yum install -y lvm2 cifs-utils quota psmisc pcs pacemaker corosync fence-agents-all resource-agents crmsh [root@cont01:/root]# systemctl enable pcsd corosync [root@cont02:/root]# systemctl enable pcsd corosync [root@cont03:/root]# systemctl enable pcsd corosync [root@cont01:/root]# systemctl start pcsd && systemctl status pcsd [root@cont02:/root]# systemctl start pcsd && systemctl status pcsd [root@cont03:/root]# systemctl start pcsd && systemctl status pcsd 5.2 设置集群密码,而且三个节点密码需一直为:pcs#typora [root@cont01:/root]# echo "pcs#typora" |passwd --stdin hacluster [root@cont02:/root]# echo "pcs#typora" |passwd --stdin hacluster [root@cont03:/root]# echo "pcs#typora" |passwd --stdin hacluster 5.3 控制节点创建配置文件corosync.conf [root@cont02:/root]# cat </etc/corosync/corosync.conf totem { version: 2 secauth:off cluster_name:openstack-cluster transport:udpu } nodelist { node { ring0_addr:cont01 nodeid:1 } node { ring0_addr:cont02 nodeid:2 } node { ring0_addr:cont03 nodeid:3 } } logging { to_logfile: yes logfile: /var/log/cluster/corosync.log to_syslog: yes debug: off } quorum { provider: corosync_votequorum } EOF [root@cont02:/root]# scp /etc/corosync/corosync.conf cont01:/etc/corosync/ [root@cont02:/root]# scp /etc/corosync/corosync.conf cont03:/etc/corosync/ 5.4 配置集群,设置集群互相认证 ssh-keygen ssh-copy-id cont01 ssh-copy-id cont02 ssh-copy-id cont03 5.5 配置节点认证 [root@cont02:/root]# pcs cluster auth cont01 cont02 cont03 -u hacluster -p"pcs#typora" cont02: Authorized cont03: Authorized cont01: Authorized

pcs cluster auth cont01 cont02 -u hacluster -p {password}

{password}表示为刚才设置的密码

5.6 创建集群 [root@cont02:/root]# pcs cluster setup --force --name openstack-cluster cont01 cont02 cont03 Destroying cluster on nodes: cont01, cont02, cont03... cont02: Stopping Cluster (pacemaker)... cont03: Stopping Cluster (pacemaker)... cont01: Stopping Cluster (pacemaker)... cont01: Successfully destroyed cluster cont02: Successfully destroyed cluster cont03: Successfully destroyed cluster Sending 'pacemaker_remote authkey' to 'cont01', 'cont02', 'cont03' cont01: successful distribution of the file 'pacemaker_remote authkey' cont03: successful distribution of the file 'pacemaker_remote authkey' cont02: successful distribution of the file 'pacemaker_remote authkey' Sending cluster config files to the nodes... cont01: Succeeded cont02: Succeeded cont03: Succeeded Synchronizing pcsd certificates on nodes cont01, cont02, cont03... cont02: Success cont03: Success cont01: Success Restarting pcsd on the nodes in order to reload the certificates... cont02: Success cont03: Success cont01: Success 5.7 启动集群并查看集群状态 [root@cont02:/root]# pcs cluster enable --all cont01: Cluster Enabled cont02: Cluster Enabled cont03: Cluster Enabled [root@cont02:/root]# pcs cluster start --all cont01: Starting Cluster (corosync)... cont02: Starting Cluster (corosync)... cont03: Starting Cluster (corosync)... cont01: Starting Cluster (pacemaker)... cont03: Starting Cluster (pacemaker)... cont02: Starting Cluster (pacemaker)... [root@cont02:/root]# pcs cluster status Cluster Status: Stack: corosync Current DC: cont03 (version 1.1.20-5.el7_7.2-3c4c782f70) - partition with quorum Last updated: Wed Jan 15 21:44:10 2020 Last change: Wed Jan 15 21:43:53 2020 by hacluster via crmd on cont03 3 nodes configured 0 resources configured PCSD Status: cont02: Online cont03: Online cont01: Online [root@cont02:/root]# ps aux | grep pacemaker root 15586 0.0 0.0 132972 8700 ? Ss 21:43 0:00 /usr/sbin/pacemakerd -f haclust+ 15587 0.1 0.0 136244 14620 ? Ss 21:43 0:00 /usr/libexec/pacemaker/cib root 15588 0.0 0.0 136064 7664 ? Ss 21:43 0:00 /usr/libexec/pacemaker/stonithd root 15589 0.0 0.0 98836 4372 ? Ss 21:43 0:00 /usr/libexec/pacemaker/lrmd haclust+ 15590 0.0 0.0 128068 6620 ? Ss 21:43 0:00 /usr/libexec/pacemaker/attrd haclust+ 15591 0.0 0.0 80508 3500 ? Ss 21:43 0:00 /usr/libexec/pacemaker/pengine haclust+ 15592 0.0 0.0 140380 8260 ? Ss 21:43 0:00 /usr/libexec/pacemaker/crmd root 15632 0.0 0.0 112712 960 pts/0 S+ 21:44 0:00 grep --color=auto pacemaker 5.8 检验corosync状态

(执行crm_verify -L -V时会报错)

[root@cont02:/root]# corosync-cfgtool -s Printing ring status. Local node ID 2 RING ID 0 id = 192.168.10.22 status = ring 0 active with no faults [root@cont02:/root]# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.10.21) runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.1.status (str) = joined runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.10.22) runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.2.status (str) = joined runtime.totem.pg.mrp.srp.members.3.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.3.ip (str) = r(0) ip(192.168.10.23) runtime.totem.pg.mrp.srp.members.3.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.3.status (str) = joined [root@cont02:/root]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 cont01 2 1 cont02 (local) 3 1 cont03 [root@cont02:/root]# crm_verify -L -V error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity Errors found during check: config not valid [root@cont02:/root]# pcs property set stonith-enabled=false [root@cont02:/root]# pcs property set no-quorum-policy=ignore [root@cont02:/root]# crm_verify -L -V [root@cont02:/root]# pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000 [root@cont02:/root]# pcs property set cluster-recheck-interval=1000

检测配置(执行时会报错)

crm_verify -L -V

根据检测结果

pcs property set stonith-enabled=false #禁用STONITH

pcs property set no-quorum-policy=ignore #忽略仲裁

crm_verify -L -V #error 消失了

crm 交互式命令,比较好用

设置合适的输入处理历史记录及策略引擎生成的错误与警告

pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000

基于时间驱动的方式进行状态处理

pcs property set cluster-recheck-interval=1000

5.9 创建集群VIP (预设VIP地址为192.168.10.20) [root@cont02:/root]# pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=192.168.10.20 cidr_netmask=32 op monitor interval=30s 6、部署Haproxy 6.1 所有控制节点安装Haproxy,并保持相同的配置,对于配置文件,建议根据实际情况进行优化。 6.1.1安装并配置haproxy [root@cont01:/root]# yum -y install haproxy [root@cont02:/root]# yum -y install haproxy [root@cont03:/root]# yum -y install haproxy [root@cont02:/root]# vim /etc/rsyslog.d/haproxy.conf ##配置HAProxy的日志 $ModLoad imudp $UDPServerRun 514 $template Haproxy,"%msg%n" local3.info -/var/log/haproxy.log;Haproxy local3.notice -/var/log/haproxy-status.log;Haproxy local3.*~ [root@cont02:/root]# scp /etc/rsyslog.d/haproxy.conf cont01:/etc/rsyslog.d/ [root@cont02:/root]# scp /etc/rsyslog.d/haproxy.conf cont03:/etc/rsyslog.d/ [root@cont01:/root]# systemctl enable haproxy.service [root@cont01:/root]# systemctl restart rsyslog.service [root@cont01:/root]# syste



CentOS 7 集群 openstack 高可用 centos

需要 登录 后方可回复, 如果你还没有账号请 注册新账号