This PR adds a watchdog for the forum service that restarts it every time it fails to recover from a MongoDB failover when using replica sets.
1. Checkout this branch and deploy the forum playbook `forum.yml` to the devstack:
$ ansible-playbook -vvv -i hosts playbooks/forum.yml -e "disable_edx_services=true"
2. Check the forum-watchdog files have been installed correcly:
3. Watch the forum-watchdog log file to make sure it has been started:
$ tail -f /edx/var/log/supervisor/forum-watchdog-stdout.log
4. Trigger the watchdog:
$ echo "Mongo::Error::OperationFailure - not master and slaveOk=false (13435)" >> /edx/var/log/supervisor/forum-stderr.log
5. Make sure the watchdog restarts the forum service:
[28/08/2019 20:02:56] Forum WatchDog service started
[28/08/2019 20:35:19] Forum Failure detected - Restarting forum service...
Configuration Pull Request
Make sure that the following steps are done before merging:
[ ] A DevOps team member has approved the PR if it is code shared across multiple services and you don't own all of the services.
[ ] Are you adding any new default values that need to be overridden when this change goes live? If so:
[ ] Update the appropriate internal repo (be sure to update for all our environments)
[ ] If you are updating a secure value rather than an internal one, file a DEVOPS ticket with details.
[ ] Add an entry to the CHANGELOG.
[ ] If you are making a complicated change, have you performed the proper testing specified on the [Ops Ansible Testing Checklist](https://openedx.atlassian.net/wiki/display/EdxOps/Ops+Ansible+Testing+Checklist)? Adding a new variable does not require the full list (although testing on a sandbox is a great idea to ensure it links with your downstream code changes).
[ ] Think about how this change will affect Open edX operators. Have you updated the wiki page for the next Open edX release?