--- name: container-infrastructure-ops description: Maintains, troubleshoots, and optimizes containerized infrastructure using Docker, Docker Compose, Kubernetes, Helm, and CI/CD pipelines. Enables system stability, security, reproducibility, and clear technical execution. Use for deployment operations, container management, networking, storage, secrets management, monitoring, and infrastructure troubleshooting. --- # Container Infrastructure Operations Comprehensive skill for managing and maintaining software stacks hosted on containerized infrastructure. ## Core Capabilities - **Docker & Docker Compose**: Service orchestration, container lifecycle, volume management, networking - **Kubernetes & Helm**: Cluster operations, deployment manifests, package management, upgrades - **CI/CD Pipelines**: GitLab CI, GitHub Actions, and runner configuration - **Networking & Routing**: Reverse proxies (Traefik, nginx), TLS/HTTPS, service discovery - **Storage & Data**: Volume mounting, backup/restore, database operations, data persistence - **Security**: Secrets management, access control, network policies, RBAC - **Monitoring & Logging**: Health checks, log aggregation, observability - **Troubleshooting**: Container debugging, resource issues, log analysis, dependency resolution ## Operational Workflows ### 1. Service Startup & Deployment **Docker Compose:** ```bash # Start all services docker compose up -d # Start specific service docker compose up -d [service_name] # Build and start docker compose up --build -d # With environment file docker compose --env-file .env up -d ``` **Kubernetes:** ```bash # Apply manifest kubectl apply -f deployment.yaml # Rolling update kubectl set image deployment/[name] [container]=[image]:[tag] # Check rollout status kubectl rollout status deployment/[name] ``` **Helm:** ```bash # Install release helm install [release-name] [chart] -f values.yaml # Upgrade existing release helm upgrade [release-name] [chart] -f values.yaml # Rollback to previous version helm rollback [release-name] [revision] ``` ### 2. Service Inspection & Monitoring **Docker Compose:** ```bash # View running services docker compose ps # View logs (follow) docker compose logs -f [service_name] # View logs with time range docker compose logs --since 10m [service_name] # Inspect container stats docker stats [container_id] ``` **Kubernetes:** ```bash # List resources kubectl get pods -n [namespace] kubectl get svc -n [namespace] # Describe resource (detailed info) kubectl describe pod [pod_name] -n [namespace] # View logs kubectl logs [pod_name] -n [namespace] kubectl logs -f [pod_name] -n [namespace] # Follow # Watch resources in real-time kubectl get pods -w -n [namespace] ``` ### 3. Environment & Configuration Management **Load environment variables:** ```bash # From .env file set -a source .env set +a # Apply to specific command env $(cat .env | xargs) docker compose up -d ``` **Manage secrets:** ```bash # Docker Compose (from file) docker secrets create [name] /path/to/secret # Kubernetes kubectl create secret generic [name] --from-file=key=/path/to/secret kubectl create secret docker-registry [name] --docker-server=[url] ``` ### 4. Troubleshooting Workflow **Container health check:** 1. Verify container is running: `docker compose ps` or `kubectl get pods` 2. Check logs: `docker compose logs [service]` or `kubectl logs [pod]` 3. Inspect configuration: Check environment variables, mounted volumes, network connectivity 4. Test connectivity: `docker exec [container] curl [service]` or `kubectl exec [pod] -- curl [service]` 5. Resource analysis: `docker stats` or `kubectl top pods` **Network troubleshooting:** ```bash # Docker Compose docker network ls docker network inspect [network_name] # Kubernetes kubectl get networkpolicies -n [namespace] kubectl describe networkpolicy [name] -n [namespace] ``` **Volume & storage issues:** ```bash # Docker Compose docker volume ls docker volume inspect [volume_name] # Kubernetes kubectl get pv kubectl get pvc -n [namespace] kubectl describe pvc [name] -n [namespace] ``` ### 5. Backup & Restore Operations **Docker Compose volumes:** ```bash # Backup volume docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar czf /backup/backup.tar.gz -C /data . # Restore volume docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar xzf /backup/backup.tar.gz -C /data ``` **Database backup within containers:** ```bash # PostgreSQL docker compose exec [postgres_service] pg_dump -U [user] [db] > backup.sql # MySQL/MariaDB docker compose exec [mysql_service] mysqldump -u [user] -p [db] > backup.sql ``` ### 6. Security & Access Control **Docker security best practices:** - Use read-only root filesystem: `read_only: true` - Drop unnecessary capabilities: `cap_drop: [ALL]` - Run as non-root user: `user: "1000:1000"` - Use secrets for sensitive data (not environment variables) **Kubernetes RBAC:** ```bash # Create service account kubectl create serviceaccount [name] -n [namespace] # Bind role to account kubectl create rolebinding [binding-name] --clusterrole=[role] --serviceaccount=[namespace]:[account] ``` ## Debugging Strategies **Container execution:** ```bash # Docker Compose docker compose exec [service] /bin/bash # Interactive shell docker compose exec [service] ps aux # List processes docker compose exec [service] env # View environment # Kubernetes kubectl exec -it [pod] -- /bin/bash kubectl exec [pod] -- ps aux ``` **Log analysis:** - Check application logs: `docker logs` or `kubectl logs` - Check container startup logs: Look for early exit, missing dependencies, config errors - Cross-reference with timestamps to correlate events across services **Resource constraints:** ```bash # Docker docker inspect [container] | grep -A 10 Memory # Kubernetes kubectl top nodes kubectl top pods -n [namespace] ``` ## Configuration Best Practices - **Immutable infrastructure**: Rebuild containers rather than modifying running instances - **Health checks**: Define liveness and readiness probes - **Resource limits**: Set CPU/memory requests and limits to prevent resource contention - **Rolling updates**: Use rolling deployment strategies to maintain availability - **Secrets separation**: Store secrets outside version control (use `.env`, K8s secrets, or secret managers) - **Logging**: Aggregate logs centrally; avoid storing logs in containers ## Common Error Patterns | Issue | Symptom | Troubleshooting | |-------|---------|-----------------| | Port conflict | `bind: address already in use` | Check existing process: `lsof -i :[port]`; Kill if needed | | Missing dependency | Service fails to start | Check logs for missing service/network; Verify service startup order | | Resource exhaustion | Slow/hanging containers | Check CPU/memory usage; Increase limits; Reduce replica count | | Networking | Services can't communicate | Verify network name; Check firewall rules; Test DNS resolution | | Volume mount | Permission denied in container | Verify mount path exists; Check file permissions; Confirm user ID | | Config error | Parse/validation error at startup | Validate YAML syntax; Check environment variable substitution | ## File Structure Reference **Docker Compose project:** ``` project/ ├── docker-compose.yaml # Main orchestration ├── .env # Environment variables (secrets) ├── .env.example # Template (tracked in git) ├── config/ # Configuration files │ ├── traefik.yml │ └── app.config └── data/ # Persistent volumes ├── db/ └── uploads/ ``` **Kubernetes project:** ``` k8s/ ├── manifests/ # YAML definitions │ ├── deployment.yaml │ ├── service.yaml │ └── configmap.yaml ├── helm/ # Helm charts │ └── [chart-name]/ ├── kustomization.yaml # Kustomize overlays └── secrets/ # Sealed/encrypted secrets ``` ## Context-Specific Workflows ### Working with JMP Server For jmp-server Docker Compose stack: ```bash # View all services docker compose ps # Start specific service docker compose up -d gitea # Or: bookstack, traefik, etc. # View logs for troubleshooting docker compose logs -f traefik docker compose logs -f gitea # Backup database docker compose exec gitea-db pg_dump -U gitea gitea > gitea-backup.sql # Restart service cleanly docker compose restart gitea ``` ## Actionable Execution When troubleshooting or deploying: 1. State the objective clearly 2. Run targeted diagnostic commands 3. Report findings with specific evidence (logs, output, metrics) 4. Execute corrective actions with clear before/after confirmation 5. Document any configuration changes for reproducibility