Fix failover issue in forum service

Description

Description
=========

This PR adds a watchdog for the forum service that restarts it every time it fails to recover from a MongoDB failover when using replica sets.

Testing
----------

1. Checkout this branch and deploy the forum playbook `forum.yml` to the devstack:
```
$ ansible-playbook -vvv -i hosts playbooks/forum.yml -e "disable_edx_services=true"
```

2. Check the forum-watchdog files have been installed correcly:
```
/edx/app/forum/forum-watchdog.sh
/edx/app/supervisor/conf.available.d/forum-watchdog.conf
```

3. Watch the forum-watchdog log file to make sure it has been started:
```
$ tail -f /edx/var/log/supervisor/forum-watchdog-stdout.log
```

4. Trigger the watchdog:
```
$ echo "Mongo::Error::OperationFailure - not master and slaveOk=false (13435)" >> /edx/var/log/supervisor/forum-stderr.log
```

5. Make sure the watchdog restarts the forum service:
```
[28/08/2019 20:02:56] Forum WatchDog service started
[28/08/2019 20:35:19] Forum Failure detected - Restarting forum service...
forum: stopped
forum: started
```

Configuration Pull Request

Make sure that the following steps are done before merging:

  • [ ] A DevOps team member has approved the PR if it is code shared across multiple services and you don't own all of the services.

  • [ ] Are you adding any new default values that need to be overridden when this change goes live? If so:

  • [ ] Update the appropriate internal repo (be sure to update for all our environments)

  • [ ] If you are updating a secure value rather than an internal one, file a DEVOPS ticket with details.

  • [ ] Add an entry to the CHANGELOG.

  • [ ] If you are making a complicated change, have you performed the proper testing specified on the [Ops Ansible Testing Checklist](https://openedx.atlassian.net/wiki/display/EdxOps/Ops+Ansible+Testing+Checklist)? Adding a new variable does not require the full list (although testing on a sandbox is a great idea to ensure it links with your downstream code changes).

  • [ ] Think about how this change will affect Open edX operators. Have you updated the wiki page for the next Open edX release?

Status

Assignee

Unassigned

Reporter

Open Source Pull Request Bot

Labels

Contributor Name

Paulo Viadanna

Repo

edx/configuration

Customer

Epic Link

None

OSCM Assignee

None

Priority

Unset
Configure