Troubleshooting Guide
This guide helps you diagnose and resolve common issues with ChainLaunch.
Common Issues
Node Won't Start
Symptoms:
- Node status remains "Stopped" after clicking start
- Error message in logs
Diagnosis:
-
Check node logs: Go to the nodes list, and enter the node that is failing. Then at the bottom check the logs.
-
Check system resources:
- Verify CPU and memory available
- Check disk space:
df -h
-
Check port availability:
# Linux/macOS
lsof -i :30303 # P2P port
lsof -i :8545 # JSON-RPC port
Solutions:
| Error Message | Solution |
|---|---|
Port already in use | Change node port in configuration or kill process using port |
Insufficient disk space | Free up disk space or mount larger volume |
Permission denied | Check file permissions on node data directory |
Connection refused | Check firewall rules and network connectivity |
Out of memory | Increase memory allocation or reduce node count |
Nodes Not Discovering Each Other
Symptoms:
- Block height not increasing
- Peer count = 0
- Consensus not starting (Fabric/Besu)
Diagnosis:
-
Check peer count:
# For Besu via RPC
curl -X POST http://localhost:8545 \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "net_peerCount",
"params": [],
"id": 1
}' -
Check network connectivity between nodes:
# Test if node A can reach node B
ping node-b-ip
telnet node-b-ip 30303 -
Check enode URLs (Besu):
# Get node's enode
curl -X POST http://localhost:8545 \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "admin_nodeInfo",
"params": [],
"id": 1
}'
Solutions:
| Issue | Solution |
|---|---|
| Firewall blocking ports | Open P2P port (30303) and RPC port (8545) in firewall |
| Nodes on different networks | Verify genesis block hash matches across nodes |
| Bootnode not running | Start bootnode and configure its address in other nodes |
| DNS not resolving | Use IP addresses instead of hostnames |
| Network policy restricting traffic | Review Kubernetes network policies or security groups |
Backup Issues
Symptoms:
- Backup job fails with "connection refused" or "access denied"
- Scheduled backups not running
- Restore operation fails
Diagnosis:
-
Check backup target configuration:
curl http://localhost:8100/api/v1/backups/targets -
Verify S3 connectivity:
# Test S3 access with AWS CLI
aws s3 ls s3://your-backup-bucket/ -
Check backup status:
curl http://localhost:8100/api/v1/backups
Solutions:
| Issue | Solution |
|---|---|
Access Denied on S3 | Verify IAM credentials and bucket policy allow read/write access |
NoSuchBucket | Ensure the S3 bucket exists and the region is correct |
RequestTimeTooSkewed | Sync the system clock (ntpdate or timedatectl) |
| Scheduled backup not triggered | Verify the backup schedule is active and the cron expression is valid |
| Restore fails with checksum error | The backup may be corrupted; try restoring from a different snapshot |
credential not found | Re-configure the backup target with valid credentials (static, instance role, or named profile) |
Monitoring Issues
Symptoms:
- Prometheus metrics endpoint not responding
- Grafana dashboards show "No data"
- Node metrics not updating
Diagnosis:
-
Check if monitoring is enabled:
curl http://localhost:8100/api/v1/settings -
Test the Prometheus metrics endpoint directly:
curl http://localhost:9090/metrics -
Verify node metrics are exposed:
# For Besu nodes
curl http://localhost:9545/metrics
# For Fabric peers
curl http://localhost:9443/metrics
Solutions:
| Issue | Solution |
|---|---|
| Prometheus not starting | Check that port 9090 is not in use; verify Prometheus configuration file is valid |
| Metrics endpoint returns 404 | Ensure monitoring was enabled when creating the node |
| Dashboards show "No data" | Verify Prometheus is scraping the correct targets; check the time range in Grafana |
| High cardinality warnings | Reduce the number of custom labels or increase Prometheus memory |
| Metrics stop updating | Restart the node; check if the process is still running |
Authentication Issues
Symptoms:
- Login returns 401 Unauthorized
- Session expires unexpectedly
- API key not accepted
Diagnosis:
-
Test authentication:
# Basic auth
curl -u admin:password http://localhost:8100/api/v1/nodes
# API key
curl -H "X-API-Key: clpro_..." http://localhost:8100/api/v1/nodes -
Check server logs for auth errors: Look for
authentication failedortoken expiredmessages in the ChainLaunch server logs.
Solutions:
| Issue | Solution |
|---|---|
401 Unauthorized with correct credentials | Ensure you are using the correct authentication method (Basic Auth vs API Key) |
| Session expires too quickly | Check session timeout configuration in settings |
| API key rejected | Verify the key has not been revoked; regenerate if necessary |
| OIDC login fails | Verify the OIDC provider configuration (issuer URL, client ID, client secret) |
403 Forbidden | The authenticated user lacks the required RBAC role (ADMIN, OPERATOR, or VIEWER) for the requested resource |
Besu Node Issues
Symptoms:
- Besu validator not producing blocks
- Consensus stalled across the network
- Peers not connecting to the network
Diagnosis:
-
Check peer count:
curl -X POST http://localhost:8545 \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "net_peerCount",
"params": [],
"id": 1
}' -
Check sync status:
curl -X POST http://localhost:8545 \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "eth_syncing",
"params": [],
"id": 1
}' -
Check latest block number:
curl -X POST http://localhost:8545 \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "eth_blockNumber",
"params": [],
"id": 1
}' -
Check node status via ChainLaunch API:
curl http://localhost:8100/api/v1/nodes
Solutions:
| Issue | Solution |
|---|---|
| Validator not producing blocks | Ensure the validator key is correctly configured and the node is part of the validator set |
| Consensus stalled | Check that a majority of validators are online; IBFT 2.0 / QBFT requires 2/3+1 validators |
Genesis block mismatch | All nodes must use the same genesis file; re-initialize nodes with the correct genesis |
| Peer discovery not working | Verify the bootnode enode URL is correct and reachable; check that the P2P port (30303) is open |
Invalid block errors | Check that all validators are running the same Besu version |
| High memory usage | Increase the JVM heap size (-Xmx) or enable pruning to reduce state storage |
| RPC endpoint not responding | Verify RPC is enabled (--rpc-http-enabled) and the host/port settings are correct |
Fabric-X Issues
Fabric-X has a few platform-specific failure modes that don't apply to classic Fabric. The Fabric-X Quickstart troubleshooting table covers the most common ones; here's the summary:
| Symptom | Likely cause | Fix |
|---|---|---|
| Quickstart phase 5 (join) times out on the first node | Cold Docker Desktop bind-mount cache | Retry the failing node individually — subsequent ones will be fast once the cache is warm. The default per-node timeout is 240s. |
dial ... context deadline exceeded on namespace create | Fabric-X local-dev mode not enabled (macOS / Windows Docker Desktop) | Set CHAINLAUNCH_FABRICX_LOCAL_DEV=true on the server process and recreate the network — addresses get rewritten to host.docker.internal so containers can dial the host. |
invalid mount config ... bind source path does not exist | Cold Docker Desktop bind-mount cache | Same as the join timeout — retry. |
Port already in use on join | Another Fabric-X network or service on the same host | Run with a different --base-port band, or --clean to wipe the prior bundle. |
Network status stuck at genesis_block_created | Normal — the network row's status field doesn't transition to ACTIVE. Container readiness is reported on each node row instead. | No action needed. Check Nodes filtered by platform FABRICX for per-node status. |
Stale TLS certs after re-running --clean | Bind-mount data outlives the container | Re-run with --data-path /path/to/server/data so --clean also purges fabricx-orderers/ and fabricx-committers/ directories. |
For deeper Fabric-X diagnostics see Fabric-X Architecture (port layout and component data flow) and Fabric-X Monitoring (Prometheus /metrics endpoints per role).