Troubleshooting Oracle OCI Cluster Startup Failures¶
Diagnosing and Fixing Problems¶
The table that follows lists some common error messages that may be logged when a cluster fails to start, describes the underlying causes, and provides remedies:
| Error message text | Cause | What to do |
|---|---|---|
Hadoop Bring up failed. File
<filename> could only be
replicated to 0 nodes... |
Master daemon cannot talk to worker daemon, or worker is down or out of disk space. | Make sure you have configured the subnet so as to allow communication among all nodes: see Configuring Oracle OCI Resources. |
The limit for this tenancy has
been exceeded |
Bringing up this cluster would exceed this tenancy’s limit for instances of this type. | Decrease the cluster size, or change the instance type, and try again. If that fails, ask Oracle support for a higher limit. |
HEALTH-CHECK-FAILED. Reason:
Failed to create socks proxy for
cluster... |
QDS cannot contact the cluster master node via SSH. | Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209); use the subnet’s security list to do this. |
Preventing Problems¶
Here are some guidelines to help you prevent similar problems in the future.
- Make sure you’ve read and understood the relevant Qubole and Cloud documentation, in particular:
- Make sure you have configured each subnet so as to allow communication among all nodes.
- Make sure you have whitelisted port 22 for the QDS NAT (52.44.223.209).
- Make sure that starting the cluster will not put you over the limit for your tenancy.