Troubleshooting v23
Recreate python virtual environment
Occasionally the python venv can get in an inconsistent state, in which case the easiest solution is to delete and recreate it. Symptoms of a broken venv can include errors during provisioning like:
TASK [Write Vagrantfile and firstboot.sh] ****************************************************************************************************************************** failed: [localhost] (item=Vagrantfile) => {"changed": false, "checksum": "bf1403a17d897b68fa8137784d298d4da36fb7f9", "item": "Vagrantfile", "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"}
To create a new virtual environment (assuming tpaexec was installed into the default location):
[tpa]$ sudo rm -rf /opt/EDB/TPA/tpa-venv [tpa]$ sudo /opt/EDB/TPA/bin/tpaexec setup
Strange AWS errors regarding credentials
If the time & date of the TPA server isn't correct, you can get AWS errors similar to this during provisioning:
TASK [Register key tpa_cluster in each region] ********************************************** An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ClientError: An error occurred (AuthFailure) when calling the DescribeKeyPairs operation: AWS was not able to validate the provided access credentials failed: [localhost] (item=eu-central-1) => {"boto3_version": "1.8.8", "botocore_version": "1.11.8", "changed": false, "error": {"code": "AuthFailure", "message": "AWS was not able to validate the provided access credentials"}, "item": "eu-central-1", "msg": "error finding keypair: An error occurred (AuthFailure) when calling the DescribeKeyPairs operation: AWS was not able to validate the provided access credentials", "response_metadata": {"http_headers": {"date": "Thu, 27 Sep 2018 12:49:41 GMT", "server": "AmazonEC2", "transfer-encoding": "chunked"}, "http_status_code": 401, "request_id": "a0d905ba-188f-48fe-8e5a-c8d8799e3232", "retry_attempts": 0}}
Solution - set the time and date correctly.
[tpa]$ sudo ntpdate pool.ntp.org
Logging
By default, all tpaexec logging will be saved in logfile <clusterdir>/ansible.log
To change the logfile location, set environment variable ANSIBLE_LOG_PATH
to the desired location - e.g.
export ANSIBLE_LOG_PATH=~/ansible.log
To increase the verbosity of logging, just add -v
/-vv
/-vvv
/-vvvv
/-vvvvv
to tpaexec command line:
[tpa]$ tpaexec deploy <clustername> -v -v shows the results of modules -vv shows the files from which tasks come -vvv shows what commands are being executed on the target machines -vvvv enables connection debugging, what callbacks have been loaded -vvvvv shows some additional ssh configuration, filepath information
Cluster test
An easy way to smoketest an existing cluster is to run:
[tpa]$ tpaexec test <clustername>
This will do a functional test of the cluster components, followed by a performance test of the cluster, using pgbench. As pgbench can take a while to complete, benchmarking can be omitted by running:
[tpa]$ tpaexec test <clustername> --excluded_tasks=pgbench
TPA server test
To check the installation of the TPA server itself, run:
[tpa]$ tpaexec selftest
Including or excluding specific tasks
When re-running a tpaexec provision or deploy after a failure or when running tests, it can sometimes be useful to miss out tasks using TPA's task selection mechanism.