Skip to main content

Troubleshooting

This document provides solutions to common issues you may encounter when using Overlock.

Table of Contents

Common Issues

Docker Desktop on Linux: daemon not found

Symptoms:

  • overlock env create fails immediately with a Docker connection error
  • docker ps works fine in the same shell
  • The active Docker context is desktop-linux

Cause: Overlock uses the Docker Go SDK, which does not read Docker CLI contexts. When Docker Desktop is installed on Linux, the daemon socket lives under ~/.docker/desktop/ rather than /var/run/docker.sock, so the SDK cannot find it without help.

Solution: Export DOCKER_HOST to point at the Docker Desktop socket:

export DOCKER_HOST=unix://$HOME/.docker/desktop/docker.raw.sock

Add the line to your shell profile (~/.bashrc, ~/.zshrc) to make it permanent.

Environment Creation Fails

Symptoms:

  • Command fails with cluster creation errors
  • Timeout during environment setup
  • Docker-related errors

Solutions:

  1. Ensure Docker is running:

    docker ps

    If this fails, start Docker daemon.

  2. Check Kubernetes engine installation:

    • For KinD: kind version
    • For K3s: k3s --version
    • For K3d: k3d version
  3. Verify system resources:

    • Check available memory: free -h
    • Check available disk space: df -h
    • Ensure at least 4GB RAM and 10GB disk space available
  4. Check for port conflicts:

    # Check if required ports are in use
    sudo lsof -i :6443 # Kubernetes API
    sudo lsof -i :5000 # Local registry
  5. Clean up existing environments:

    overlock environment list
    overlock environment delete <old-env-name>

Package Installation Fails

Symptoms:

  • Configuration, provider, or function fails to install
  • Timeout errors
  • Authentication errors

Solutions:

  1. Check internet connectivity:

    curl -I https://xpkg.upbound.io
  2. Verify package URL is correct:

    • Check for typos in package name
    • Verify version exists
    • Try accessing URL in browser
  3. Use debug mode for details:

    overlock --debug provider install <url>
  4. Check authentication for private registries:

    • Verify registry credentials
    • Ensure you're logged into the registry
  5. Verify Crossplane is ready:

    kubectl get pods -n crossplane-system

    All pods should be in Running state.

Provider Not Working

Symptoms:

  • Provider installed but resources not working
  • Authentication errors in provider logs
  • Resources stuck in non-ready state

Solutions:

  1. Verify provider is installed and healthy:

    overlock provider list
    kubectl get providers
  2. Check provider logs:

    kubectl logs -n crossplane-system deployment/<provider-name>
  3. Verify provider configuration:

    • For GCP: Check service account key configuration
    • For AWS: Verify AWS credentials
    • For Azure: Check Azure credentials
  4. Check Crossplane version compatibility:

    • Some providers require specific Crossplane versions
    • Check provider documentation for compatibility matrix
  5. Verify ProviderConfig exists:

    kubectl get providerconfigs

Freezing During Environment Creation

Symptom

The process freezes for a few minutes during the "Joining worker nodes" step when creating multiple environments with Overlock CLI. Eventually, it fails with the following error:

ERROR: failed to create cluster: failed to join node with kubeadm: command "docker exec --privileged dest-worker kubeadm join --config /kind/kubeadm.conf --skip-phases=preflight --v=6" failed with error: exit status 1

Cause

When the Overlock CLI creates environments, it also installs resources, likely increasing the number of file system watches (inotify instances) that Kubernetes and its components need to manage. This increased usage, combined with existing watches from previous Overlock environments, could exceed the default system limits, leading to the kubelet.service on the newly created worker node failing to start due to the error: Failed to allocate directory watch: Too many open files.

Steps to Resolve

  1. Run the following command to adjust the fs.inotify.max_user_instances setting on your host:

    sysctl fs.inotify.max_user_instances=512
  2. Retry the environment creation command:

    overlock env create <name>

Explanation

Why did the error occur?

The error indicates that the kubelet.service failed to start due to the system reaching its limit for the number of file system watches (inotify instances) allowed per user.

How did adjusting fs.inotify.max_user_instances solve the error?

Increasing the fs.inotify.max_user_instances setting allows more inotify instances to be allocated per user, resolving the resource limitation that caused the kubelet.service to fail.

Node Create Hangs on "Waiting for node to appear"

Symptom

overlock env node create (k3s-docker engine) prints the node container start log, then loops forever on:

DEBUG   Waiting for node with label overlock.io/node=<name> to appear...

docker ps -a shows the agent container as exited (exit=1), and docker logs <agent-container> includes errors such as:

inotify_init: too many open files
error initializing watcher: too many open files
Failed to start cAdvisor: inotify_init: too many open files

Cause

The Linux kernel enforces fs.inotify.max_user_instances per UID. The K3s server container already consumes a large share of that budget; when the agent container starts, its kubelet, cAdvisor, and dynamic plugin watchers all call inotify_init() and the kernel returns EMFILE. The agent process exits, the Kubernetes node is never registered, and the wait loop never resolves.

The default on many distributions is 128, which is too low for running a K3s server plus one or more agent containers on the same host.

Steps to Resolve

  1. Raise the per-user inotify limits (also raise max_user_watches while you're there — it has the same root cause for kubelet's directory watches):

    sudo sysctl -w fs.inotify.max_user_instances=8192
    sudo sysctl -w fs.inotify.max_user_watches=524288
  2. Make the change persistent across reboots:

    sudo tee /etc/sysctl.d/99-overlock.conf <<'EOF'
    fs.inotify.max_user_instances = 8192
    fs.inotify.max_user_watches = 524288
    EOF
    sudo sysctl --system
  3. Remove the failed agent container so the next attempt starts clean, then retry:

    docker rm -f k3s-docker-<environment>-<node>
    overlock env node create <node> --environment <environment> --engine k3s-docker

Verification

Check the new limits are applied:

cat /proc/sys/fs/inotify/max_user_instances
cat /proc/sys/fs/inotify/max_user_watches

Firewall Configuration for Remote Nodes

When using k3s-docker engine with remote nodes via SSH, the server host firewall must allow K3s traffic. If using firewalld:

Open required ports

sudo firewall-cmd --zone=public --add-port=6443/tcp --permanent   # K3s API server
sudo firewall-cmd --zone=public --add-port=6444/tcp --permanent # K3s supervisor
sudo firewall-cmd --zone=public --add-port=8472/udp --permanent # Flannel VXLAN overlay
sudo firewall-cmd --zone=public --add-port=10250/tcp --permanent # Kubelet

Trust K3s interfaces

sudo firewall-cmd --zone=trusted --add-interface=cni0 --permanent
sudo firewall-cmd --zone=trusted --add-interface=flannel.1 --permanent

Apply changes

sudo firewall-cmd --reload

Getting Help

Command Help

Use the --help flag to get detailed information about any command:

# General help
overlock --help

# Command-specific help
overlock environment --help
overlock configuration --help
overlock provider --help

Debug Mode

Enable debug mode to see detailed output:

overlock --debug <command>

This will show:

  • API calls being made
  • Detailed error messages
  • Internal operation logs
  • Kubernetes resource operations

Community Support

If you're still experiencing issues:

  1. Check existing issues: Search GitHub Issues
  2. Join Discord: Get help from the community on Discord
  3. Create an issue: Report bugs or request features on GitHub

Providing Debug Information

When reporting issues, include:

  1. Overlock version:

    overlock --version
  2. Debug output:

    overlock --debug <failing-command> 2>&1 | tee debug.log
  3. System information:

    • Operating system and version
    • Docker version: docker version
    • Kubernetes engine and version
    • Available resources (memory, disk)
  4. Kubernetes state:

    kubectl get pods -A
    kubectl get providers
    kubectl get configurations

Additional Resources