This is a two part article, here I will share the steps to configure OpenStack High Availability (HA) between two controllers. In the second part I will share the steps to configure HAProxy and move keystone service endpoints to loadbalancer. By default if your bring up a controller and compute node using tripleo configuration then the controllers will by default get configured via pacemaker cluster. But if you are manually bringing up your openstack setup using packstack or devstack or by manually creating all the database and services then you will have to manually configure cluster between the controllers to configure OpenStack High Availability (HA).

Configure OpenStack High Availability (HA)
For the sake of this article I brought up two controller nodes using
packstack on two different virtual machines using CentOS 7
running on Oracle VirtualBox installed on my
Linux Server. After the successful completion of packstack
you will observe keystonerc_admin file in the home folder of the root
user.
Installing the Pacemaker resource manager
Since we will configure OpenStack High Availability using pacemaker and corosync, first of all we need to install all the rpms required for the cluster setup. So we will install Pacemaker to manage the VIPs that we will use with HAProxy to make the web services highly available.
So install pacemaker on all the controller nodes
[root@controller2 ~]# yum install -y pcs fence-agents-all
[root@controller1 ~]# yum install -y pcs fence-agents-all
Verify that the software installed correctly by running the following command:
[root@controller1 ~]# rpm -q pcs
pcs-0.9.162-5.el7.centos.2.x86_64
[root@controller2 ~]# rpm -q pcs
pcs-0.9.162-5.el7.centos.2.x86_64
Next, add rules to the firewall to allow cluster traffic:
[root@controller1 ~]# firewall-cmd --permanent --add-service=high-availability
success
[root@controller1 ~]# firewall-cmd --reload
success
[root@controller2 ~]# firewall-cmd --permanent --add-service=high-availability
success
[root@controller2 ~]# firewall-cmd --reload
success
If you run into any problems during testing, you might want to disable the firewall and SELinux entirely until you have everything working. This may create significant security issues and should not be performed on machines that will be exposed to the outside world, but may be appropriate during development and testing on a protected host.
The installed packages will create a hacluster user with a disabled
password. While this is fine for running pcs commands locally, the
account needs a login password in order to perform such tasks as syncing
the corosync configuration, or starting and stopping the cluster on
other nodes.
Set the password for the Pacemaker cluster on each controller node using the following command:
[root@controller1 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@controller2 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
Start the Pacemaker cluster manager on each node:
[root@controller1 ~]# systemctl start pcsd.service
[root@controller1 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@controller2 ~]# systemctl start pcsd.service
[root@controller2 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Configure Corosync
To configure Openstack High Availability we need to configure corosync
on both the nodes, use pcs cluster auth to authenticate as the
hacluster user:
[root@controller1 ~]# pcs cluster auth controller1 controller2
Username: hacluster
Password:
controller2: Authorized
controller1: Authorized
[root@controller2 ~]# pcs cluster auth controller1 controller2
Username: hacluster
Password:
controller2: Authorized
controller1: Authorized
Finally, run the following commands on the first node to create the
cluster and start it. Here our cluster name will be openstack
[root@controller1 ~]# pcs cluster setup --start --name openstack controller1 controller2
Destroying cluster on nodes: controller1, controller2...
controller1: Stopping Cluster (pacemaker)...
controller2: Stopping Cluster (pacemaker)...
controller1: Successfully destroyed cluster
controller2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'controller1', 'controller2'
controller1: successful distribution of the file 'pacemaker_remote authkey'
controller2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
controller1: Succeeded
controller2: Succeeded
Starting cluster on nodes: controller1, controller2...
controller1: Starting Cluster...
controller2: Starting Cluster...
Synchronizing pcsd certificates on nodes controller1, controller2...
controller2: Success
controller1: Success
Restarting pcsd on the nodes in order to reload the certificates...
controller2: Success
controller1: Success
Enable the pacemaker and corosync services on both the controller so
they can automatically start on boot
[root@controller1 ~]# systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
[root@controller1 ~]# systemctl enable corosync
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
[root@controller2 ~]# systemctl enable corosync
Created symlink from /etc/systemd/system/multi-user.target.wants/corosync.service to /usr/lib/systemd/system/corosync.service.
[root@controller2 ~]# systemctl enable pacemaker
Created symlink from /etc/systemd/system/multi-user.target.wants/pacemaker.service to /usr/lib/systemd/system/pacemaker.service.
Validate cluster using pacemaker
Verify that the cluster started successfully using the following command on both the nodes:
[root@controller1 ~]# pcs status
Cluster name: openstack
Stack: corosync
Current DC: controller2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Tue Oct 16 11:51:13 2018
Last change: Tue Oct 16 11:50:51 2018 by root via cibadmin on controller1
2 nodes configured
0 resources configured
Online: [ controller1 controller2 ]
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@controller2 ~]# pcs status
Cluster name: openstack
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: controller2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Oct 15 17:04:29 2018
Last change: Mon Oct 15 16:49:09 2018 by hacluster via crmd on controller2
2 nodes configured
0 resources configured
Online: [ controller1 controller2 ]
No resources
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
How to start the Cluster
Now that corosync is configured, it is time to start the cluster. The
command below will start corosync and pacemaker on both nodes in the
cluster. If you are issuing the start command from a different node than
the one you ran the pcs cluster auth command on earlier, you must
authenticate on the current node you are logged into before you will be
allowed to start the cluster.
[root@controller1 ~]# pcs cluster start --all
An alternative to using the pcs cluster start --all command is to
issue either of the below command sequences on each node in the cluster
separately:
[root@controller1 ~]# pcs cluster start
Starting Cluster...
or
[root@controller1 ~]# systemctl start corosync.service
[root@controller1 ~]# systemctl start pacemaker.service
Verify Corosync Installation
First, use corosync-cfgtool to check whether cluster communication is
happy:
[root@controller2 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
id = 192.168.122.22
status = ring 0 active with no faults
So all looks normal with our fixed IP address (not a 127.0.0.x loopback
address) listed as the id, and no faults for the status.
If you see something different, you might want to start by checking the
node’s network, firewall and SELinux configurations.
Next, check the membership and quorum APIs:
[root@controller2 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.122.20)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.122.22)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Check the status of corosync service
[root@controller2 ~]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 controller1
2 1 controller2 (local)
You should see both nodes have joined the cluster.
Repeat the same steps on both the controller to validate the corosync services
Verify the cluster configuration
Before we make any changes, it’s a good idea to check the validity of the configuration.
[root@controller1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
As you can see, the tool has found some errors.
In order to guarantee the safety of your data, [5] fencing (also
called STONITH) is enabled by default. However, it also knows when no
STONITH configuration has been supplied and reports this as a problem
(since the cluster
will not be able to make progress if a situation requiring node
fencing arises).
We will disable this feature for now and configure it later. To disable
STONITH, set the stonith-enabled cluster option to false on both the
controller nodes:
[root@controller1 ~]# pcs property set stonith-enabled=false
[root@controller1 ~]# crm_verify -L
[root@controller2 ~]# pcs property set stonith-enabled=false
[root@controller2 ~]# crm_verify -L
With the new cluster option set, the configuration is now valid.
stonith-enabled=false is completely inappropriate for a
production cluster. It tells the cluster to simply pretend that the
nodes which fails are safely in powered off state. Some vendors will
refuse to support clusters that have STONITH disabled.
I will continue this article i.e to configure OpenStack High Availability in separate part. In the next part I will share the steps to configure HAProxy and we will manage it as a resource. Also the detail steps to move OpenStack API endpoints behind the cluster load balancer.


