How to create a GPU-enabled development environment with Coder using Docker provider #18722

iomgaa-ycz · 2025-07-02T16:57:43Z

iomgaa-ycz
Jul 2, 2025

Issue Title

How to create a GPU-enabled development environment with Coder using Docker provider

Description

I'm trying to create a Coder template that provides a GPU-accelerated development environment for machine learning work. I want to build a custom Docker image with CUDA support and have it accessible through Coder's web interface.

Environment

OS: Ubuntu 20.04
GPU: NVIDIA RTX 4090
Docker: GPU support verified working
Coder: 2.24
Terraform: 1.12.2

GPU Environment Verification

I've confirmed that my Docker + GPU setup is working correctly:

docker run --gpus all --rm pytorch/manylinux-cuda118:latest nvidia-smi

This command successfully shows GPU information, confirming that:

NVIDIA Docker runtime is properly configured
GPU passthrough to containers works
CUDA drivers are accessible from within containers

Create a Coder template that builds a custom Docker image with:
- NVIDIA CUDA 11.8 support
- Python development environment with PyTorch, Jupyter Lab, etc.
- GPU monitoring and resource tracking
- VS Code and JetBrains IDE integration
The workspace should:
- Have GPU access (nvidia-smi should work inside the workspace)
- Provide web access to VS Code

Current Issues

I'm encountering several challenges:

When I instantiated the template I built into a workspace, I found that the instance did not have a GPU.

Are there any existing examples or community templates for GPU-enabled Coder workspaces that I could reference?

Any help, examples, or guidance would be greatly appreciated!

terraform {
  required_providers {
    coder = {
      source = "coder/coder"
    }
    docker = {
      source = "kreuzwerker/docker"
    }
  }
}

locals {
  username = data.coder_workspace_owner.me.name
}

variable "docker_socket" {
  default     = ""
  description = "(Optional) Docker socket URI"
  type        = string
}

variable "gpu_enabled" {
  default     = true
  description = "Enable GPU support for the workspace"
  type        = bool
}

variable "gpu_count" {
  default     = "all"
  description = "Number of GPUs to allocate (use 'all' for all GPUs, or specify device IDs like '0,1')"
  type        = string
}

provider "docker" {
  # Defaulting to null if the variable is an empty string lets us have an optional variable without having to set our own default
  host = var.docker_socket != "" ? var.docker_socket : null
}

data "coder_provisioner" "me" {}
data "coder_workspace" "me" {}
data "coder_workspace_owner" "me" {}

resource "coder_agent" "main" {
  arch           = data.coder_provisioner.me.arch
  os             = "linux"
  startup_script = <<-EOT
    set -e

    # Create coder user if it doesn't exist
    if ! id "coder" &>/dev/null; then
        useradd --create-home --shell=/bin/bash --groups=sudo coder
        echo "coder ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/90-coder
    fi

    # Ensure coder user owns the home directory
    chown -R coder:coder /home/coder

    # Switch to coder user for the rest of the setup
    sudo -u coder bash << 'EOF'
    # Prepare user home with default files on first start.
    if [ ! -f ~/.init_done ]; then
      # Create basic shell configuration
      echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc
      echo 'alias ll="ls -la"' >> ~/.bashrc
      
      # Check GPU availability
      if command -v nvidia-smi &> /dev/null; then
        echo "GPU detected:"
        nvidia-smi
        echo 'export CUDA_VISIBLE_DEVICES=all' >> ~/.bashrc
      else
        echo "No GPU detected or nvidia-smi not available"
      fi
      
      # Install basic Python packages
      if command -v pip &> /dev/null; then
        pip install --user jupyter notebook ipython
      fi
      
      touch ~/.init_done
    fi
    EOF

    # Install basic development tools
    apt-get update
    apt-get install -y curl wget git vim nano htop tree sudo

    echo "Workspace setup completed!"
  EOT

  # These environment variables allow you to make Git commits right away after creating a
  # workspace. Note that they take precedence over configuration defined in ~/.gitconfig!
  env = {
    GIT_AUTHOR_NAME     = coalesce(data.coder_workspace_owner.me.full_name, data.coder_workspace_owner.me.name)
    GIT_AUTHOR_EMAIL    = "${data.coder_workspace_owner.me.email}"
    GIT_COMMITTER_NAME  = coalesce(data.coder_workspace_owner.me.full_name, data.coder_workspace_owner.me.name)
    GIT_COMMITTER_EMAIL = "${data.coder_workspace_owner.me.email}"
    # GPU相关环境变量
    NVIDIA_VISIBLE_DEVICES = var.gpu_enabled ? "all" : ""
    CUDA_VISIBLE_DEVICES   = var.gpu_enabled ? "all" : ""
  }

  # The following metadata blocks are optional. They are used to display
  # information about your workspace in the dashboard.
  metadata {
    display_name = "CPU Usage"
    key          = "0_cpu_usage"
    script       = "coder stat cpu"
    interval     = 10
    timeout      = 1
  }

  metadata {
    display_name = "RAM Usage"
    key          = "1_ram_usage"
    script       = "coder stat mem"
    interval     = 10
    timeout      = 1
  }

  metadata {
    display_name = "GPU Usage"
    key          = "2_gpu_usage"
    script       = <<EOT
      if command -v nvidia-smi &> /dev/null; then
        nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits | head -1 | xargs printf "%s%%"
      else
        echo "No GPU"
      fi
    EOT
    interval     = 10
    timeout      = 1
  }

  metadata {
    display_name = "GPU Memory"
    key          = "3_gpu_memory"
    script       = <<EOT
      if command -v nvidia-smi &> /dev/null; then
        nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | head -1 | awk '{printf "%.1f/%.1f GB", $1/1024, $2/1024}'
      else
        echo "No GPU"
      fi
    EOT
    interval     = 10
    timeout      = 1
  }

  metadata {
    display_name = "Home Disk"
    key          = "4_home_disk"
    script       = "coder stat disk --path $${HOME}"
    interval     = 60
    timeout      = 1
  }

  metadata {
    display_name = "CPU Usage (Host)"
    key          = "5_cpu_usage_host"
    script       = "coder stat cpu --host"
    interval     = 10
    timeout      = 1
  }

  metadata {
    display_name = "Memory Usage (Host)"
    key          = "6_mem_usage_host"
    script       = "coder stat mem --host"
    interval     = 10
    timeout      = 1
  }

  metadata {
    display_name = "Load Average (Host)"
    key          = "7_load_host"
    script   = <<EOT
      echo "`cat /proc/loadavg | awk '{ print $1 }'` `nproc`" | awk '{ printf "%0.2f", $1/$2 }'
    EOT
    interval = 60
    timeout  = 1
  }

  metadata {
    display_name = "Swap Usage (Host)"
    key          = "8_swap_host"
    script       = <<EOT
      free -b | awk '/^Swap/ { printf("%.1f/%.1f", $3/1024.0/1024.0/1024.0, $2/1024.0/1024.0/1024.0) }'
    EOT
    interval     = 10
    timeout      = 1
  }
}

# See https://registry.coder.com/modules/coder/code-server
module "code-server" {
  count  = data.coder_workspace.me.start_count
  source = "registry.coder.com/modules/code-server/coder"
  version = "~> 1.0"

  agent_id = coder_agent.main.id
  order    = 1
}

# See https://registry.coder.com/modules/coder/jetbrains-gateway
module "jetbrains_gateway" {
  count  = data.coder_workspace.me.start_count
  source = "registry.coder.com/modules/jetbrains-gateway/coder"
  version = "~> 1.0"

  # JetBrains IDEs to make available for the user to select
  jetbrains_ides = ["IU", "PS", "WS", "PY", "CL", "GO", "RM", "RD", "RR"]
  default        = "PY"  # 默认使用PyCharm Professional，适合GPU开发

  # Default folder to open when starting a JetBrains IDE
  folder = "/home/coder"

  agent_id   = coder_agent.main.id
  agent_name = "main"
  order      = 2
}

resource "docker_volume" "home_volume" {
  name = "coder-${data.coder_workspace.me.id}-home"
  # Protect the volume from being deleted due to changes in attributes.
  lifecycle {
    ignore_changes = all
  }
  # Add labels in Docker to keep track of orphan resources.
  labels {
    label = "coder.owner"
    value = data.coder_workspace_owner.me.name
  }
  labels {
    label = "coder.owner_id"
    value = data.coder_workspace_owner.me.id
  }
  labels {
    label = "coder.workspace_id"
    value = data.coder_workspace.me.id
  }
  labels {
    label = "coder.workspace_name_at_creation"
    value = data.coder_workspace.me.name
  }
}

resource "docker_container" "workspace" {
  count = data.coder_workspace.me.start_count
  
  # 使用支持GPU的PyTorch镜像
  image = "pytorch/manylinux-cuda118:latest"
  
  # Uses lower() to avoid Docker restriction on container names.
  name = "coder-${data.coder_workspace_owner.me.name}-${lower(data.coder_workspace.me.name)}"
  
  # Hostname makes the shell more user friendly: coder@my-workspace:~$
  hostname = data.coder_workspace.me.name
  
  # Use the docker gateway if the access URL is 127.0.0.1
  entrypoint = ["sh", "-c", replace(coder_agent.main.init_script, "/localhost|127\\.0\\.0\\.1/", "host.docker.internal")]
  
  env = [
    "CODER_AGENT_TOKEN=${coder_agent.main.token}",
    "NVIDIA_VISIBLE_DEVICES=${var.gpu_enabled ? var.gpu_count : ""}",
    "CUDA_VISIBLE_DEVICES=${var.gpu_enabled ? var.gpu_count : ""}",
    "NVIDIA_DRIVER_CAPABILITIES=compute,utility"
  ]

  # GPU配置
  runtime = var.gpu_enabled ? "nvidia" : null
  
  # 如果启用GPU，配置GPU访问
  dynamic "device_requests" {
    for_each = var.gpu_enabled ? [1] : []
    content {
      driver       = "nvidia"
      count        = var.gpu_count == "all" ? -1 : null
      device_ids   = var.gpu_count != "all" ? split(",", var.gpu_count) : null
      capabilities = [["gpu"]]
    }
  }

  host {
    host = "host.docker.internal"
    ip   = "host-gateway"
  }
  
  volumes {
    container_path = "/home/coder"
    volume_name    = docker_volume.home_volume.name
    read_only      = false
  }

  # 添加共享内存大小，对于深度学习很有用
  shm_size = 2048

  # Add labels in Docker to keep track of orphan resources.
  labels {
    label = "coder.owner"
    value = data.coder_workspace_owner.me.name
  }
  labels {
    label = "coder.owner_id"
    value = data.coder_workspace_owner.me.id
  }
  labels {
    label = "coder.workspace_id"
    value = data.coder_workspace.me.id
  }
  labels {
    label = "coder.workspace_name"
    value = data.coder_workspace.me.name
  }
}

# 输出GPU状态信息
resource "coder_metadata" "workspace_info" {
  count       = data.coder_workspace.me.start_count
  resource_id = docker_container.workspace[0].id

  item {
    key   = "image"
    value = docker_container.workspace[0].image
  }
  
  item {
    key   = "gpu_enabled"
    value = var.gpu_enabled
  }
  
  item {
    key   = "gpu_config"
    value = var.gpu_enabled ? var.gpu_count : "disabled"
  }
}

matifali · 2025-07-02T17:07:14Z

matifali
Jul 2, 2025
Maintainer

I have a few that I am not actively maintaining at https://github.com/matifali/coder-templates. Let me know if they work for you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to create a GPU-enabled development environment with Coder using Docker provider #18722

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TMZ Celebrity News – Breaking Stories, Videos & Gossip

🎥 Watch TMZ Live

How to create a GPU-enabled development environment with Coder using Docker provider #18722

Uh oh!

iomgaa-ycz Jul 2, 2025

Issue Title

Description

Environment

GPU Environment Verification

Current Issues

Replies: 1 comment

Uh oh!

matifali Jul 2, 2025 Maintainer

TMZ Celebrity News – Breaking Stories, Videos & Gossip

🎥 Watch TMZ Live

iomgaa-ycz
Jul 2, 2025

matifali
Jul 2, 2025
Maintainer