First commit
This commit is contained in:
298
.agents/skills/container-infrastructure-ops/SKILL.md
Normal file
298
.agents/skills/container-infrastructure-ops/SKILL.md
Normal file
@@ -0,0 +1,298 @@
|
||||
---
|
||||
name: container-infrastructure-ops
|
||||
description: Maintains, troubleshoots, and optimizes containerized infrastructure using Docker, Docker Compose, Kubernetes, Helm, and CI/CD pipelines. Enables system stability, security, reproducibility, and clear technical execution. Use for deployment operations, container management, networking, storage, secrets management, monitoring, and infrastructure troubleshooting.
|
||||
---
|
||||
|
||||
# Container Infrastructure Operations
|
||||
|
||||
Comprehensive skill for managing and maintaining software stacks hosted on containerized infrastructure.
|
||||
|
||||
## Core Capabilities
|
||||
|
||||
- **Docker & Docker Compose**: Service orchestration, container lifecycle, volume management, networking
|
||||
- **Kubernetes & Helm**: Cluster operations, deployment manifests, package management, upgrades
|
||||
- **CI/CD Pipelines**: GitLab CI, GitHub Actions, and runner configuration
|
||||
- **Networking & Routing**: Reverse proxies (Traefik, nginx), TLS/HTTPS, service discovery
|
||||
- **Storage & Data**: Volume mounting, backup/restore, database operations, data persistence
|
||||
- **Security**: Secrets management, access control, network policies, RBAC
|
||||
- **Monitoring & Logging**: Health checks, log aggregation, observability
|
||||
- **Troubleshooting**: Container debugging, resource issues, log analysis, dependency resolution
|
||||
|
||||
## Operational Workflows
|
||||
|
||||
### 1. Service Startup & Deployment
|
||||
|
||||
**Docker Compose:**
|
||||
```bash
|
||||
# Start all services
|
||||
docker compose up -d
|
||||
|
||||
# Start specific service
|
||||
docker compose up -d [service_name]
|
||||
|
||||
# Build and start
|
||||
docker compose up --build -d
|
||||
|
||||
# With environment file
|
||||
docker compose --env-file .env up -d
|
||||
```
|
||||
|
||||
**Kubernetes:**
|
||||
```bash
|
||||
# Apply manifest
|
||||
kubectl apply -f deployment.yaml
|
||||
|
||||
# Rolling update
|
||||
kubectl set image deployment/[name] [container]=[image]:[tag]
|
||||
|
||||
# Check rollout status
|
||||
kubectl rollout status deployment/[name]
|
||||
```
|
||||
|
||||
**Helm:**
|
||||
```bash
|
||||
# Install release
|
||||
helm install [release-name] [chart] -f values.yaml
|
||||
|
||||
# Upgrade existing release
|
||||
helm upgrade [release-name] [chart] -f values.yaml
|
||||
|
||||
# Rollback to previous version
|
||||
helm rollback [release-name] [revision]
|
||||
```
|
||||
|
||||
### 2. Service Inspection & Monitoring
|
||||
|
||||
**Docker Compose:**
|
||||
```bash
|
||||
# View running services
|
||||
docker compose ps
|
||||
|
||||
# View logs (follow)
|
||||
docker compose logs -f [service_name]
|
||||
|
||||
# View logs with time range
|
||||
docker compose logs --since 10m [service_name]
|
||||
|
||||
# Inspect container stats
|
||||
docker stats [container_id]
|
||||
```
|
||||
|
||||
**Kubernetes:**
|
||||
```bash
|
||||
# List resources
|
||||
kubectl get pods -n [namespace]
|
||||
kubectl get svc -n [namespace]
|
||||
|
||||
# Describe resource (detailed info)
|
||||
kubectl describe pod [pod_name] -n [namespace]
|
||||
|
||||
# View logs
|
||||
kubectl logs [pod_name] -n [namespace]
|
||||
kubectl logs -f [pod_name] -n [namespace] # Follow
|
||||
|
||||
# Watch resources in real-time
|
||||
kubectl get pods -w -n [namespace]
|
||||
```
|
||||
|
||||
### 3. Environment & Configuration Management
|
||||
|
||||
**Load environment variables:**
|
||||
```bash
|
||||
# From .env file
|
||||
set -a
|
||||
source .env
|
||||
set +a
|
||||
|
||||
# Apply to specific command
|
||||
env $(cat .env | xargs) docker compose up -d
|
||||
```
|
||||
|
||||
**Manage secrets:**
|
||||
```bash
|
||||
# Docker Compose (from file)
|
||||
docker secrets create [name] /path/to/secret
|
||||
|
||||
# Kubernetes
|
||||
kubectl create secret generic [name] --from-file=key=/path/to/secret
|
||||
kubectl create secret docker-registry [name] --docker-server=[url]
|
||||
```
|
||||
|
||||
### 4. Troubleshooting Workflow
|
||||
|
||||
**Container health check:**
|
||||
1. Verify container is running: `docker compose ps` or `kubectl get pods`
|
||||
2. Check logs: `docker compose logs [service]` or `kubectl logs [pod]`
|
||||
3. Inspect configuration: Check environment variables, mounted volumes, network connectivity
|
||||
4. Test connectivity: `docker exec [container] curl [service]` or `kubectl exec [pod] -- curl [service]`
|
||||
5. Resource analysis: `docker stats` or `kubectl top pods`
|
||||
|
||||
**Network troubleshooting:**
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker network ls
|
||||
docker network inspect [network_name]
|
||||
|
||||
# Kubernetes
|
||||
kubectl get networkpolicies -n [namespace]
|
||||
kubectl describe networkpolicy [name] -n [namespace]
|
||||
```
|
||||
|
||||
**Volume & storage issues:**
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker volume ls
|
||||
docker volume inspect [volume_name]
|
||||
|
||||
# Kubernetes
|
||||
kubectl get pv
|
||||
kubectl get pvc -n [namespace]
|
||||
kubectl describe pvc [name] -n [namespace]
|
||||
```
|
||||
|
||||
### 5. Backup & Restore Operations
|
||||
|
||||
**Docker Compose volumes:**
|
||||
```bash
|
||||
# Backup volume
|
||||
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar czf /backup/backup.tar.gz -C /data .
|
||||
|
||||
# Restore volume
|
||||
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar xzf /backup/backup.tar.gz -C /data
|
||||
```
|
||||
|
||||
**Database backup within containers:**
|
||||
```bash
|
||||
# PostgreSQL
|
||||
docker compose exec [postgres_service] pg_dump -U [user] [db] > backup.sql
|
||||
|
||||
# MySQL/MariaDB
|
||||
docker compose exec [mysql_service] mysqldump -u [user] -p [db] > backup.sql
|
||||
```
|
||||
|
||||
### 6. Security & Access Control
|
||||
|
||||
**Docker security best practices:**
|
||||
- Use read-only root filesystem: `read_only: true`
|
||||
- Drop unnecessary capabilities: `cap_drop: [ALL]`
|
||||
- Run as non-root user: `user: "1000:1000"`
|
||||
- Use secrets for sensitive data (not environment variables)
|
||||
|
||||
**Kubernetes RBAC:**
|
||||
```bash
|
||||
# Create service account
|
||||
kubectl create serviceaccount [name] -n [namespace]
|
||||
|
||||
# Bind role to account
|
||||
kubectl create rolebinding [binding-name] --clusterrole=[role] --serviceaccount=[namespace]:[account]
|
||||
```
|
||||
|
||||
## Debugging Strategies
|
||||
|
||||
**Container execution:**
|
||||
```bash
|
||||
# Docker Compose
|
||||
docker compose exec [service] /bin/bash # Interactive shell
|
||||
docker compose exec [service] ps aux # List processes
|
||||
docker compose exec [service] env # View environment
|
||||
|
||||
# Kubernetes
|
||||
kubectl exec -it [pod] -- /bin/bash
|
||||
kubectl exec [pod] -- ps aux
|
||||
```
|
||||
|
||||
**Log analysis:**
|
||||
- Check application logs: `docker logs` or `kubectl logs`
|
||||
- Check container startup logs: Look for early exit, missing dependencies, config errors
|
||||
- Cross-reference with timestamps to correlate events across services
|
||||
|
||||
**Resource constraints:**
|
||||
```bash
|
||||
# Docker
|
||||
docker inspect [container] | grep -A 10 Memory
|
||||
|
||||
# Kubernetes
|
||||
kubectl top nodes
|
||||
kubectl top pods -n [namespace]
|
||||
```
|
||||
|
||||
## Configuration Best Practices
|
||||
|
||||
- **Immutable infrastructure**: Rebuild containers rather than modifying running instances
|
||||
- **Health checks**: Define liveness and readiness probes
|
||||
- **Resource limits**: Set CPU/memory requests and limits to prevent resource contention
|
||||
- **Rolling updates**: Use rolling deployment strategies to maintain availability
|
||||
- **Secrets separation**: Store secrets outside version control (use `.env`, K8s secrets, or secret managers)
|
||||
- **Logging**: Aggregate logs centrally; avoid storing logs in containers
|
||||
|
||||
## Common Error Patterns
|
||||
|
||||
| Issue | Symptom | Troubleshooting |
|
||||
|-------|---------|-----------------|
|
||||
| Port conflict | `bind: address already in use` | Check existing process: `lsof -i :[port]`; Kill if needed |
|
||||
| Missing dependency | Service fails to start | Check logs for missing service/network; Verify service startup order |
|
||||
| Resource exhaustion | Slow/hanging containers | Check CPU/memory usage; Increase limits; Reduce replica count |
|
||||
| Networking | Services can't communicate | Verify network name; Check firewall rules; Test DNS resolution |
|
||||
| Volume mount | Permission denied in container | Verify mount path exists; Check file permissions; Confirm user ID |
|
||||
| Config error | Parse/validation error at startup | Validate YAML syntax; Check environment variable substitution |
|
||||
|
||||
## File Structure Reference
|
||||
|
||||
**Docker Compose project:**
|
||||
```
|
||||
project/
|
||||
├── docker-compose.yaml # Main orchestration
|
||||
├── .env # Environment variables (secrets)
|
||||
├── .env.example # Template (tracked in git)
|
||||
├── config/ # Configuration files
|
||||
│ ├── traefik.yml
|
||||
│ └── app.config
|
||||
└── data/ # Persistent volumes
|
||||
├── db/
|
||||
└── uploads/
|
||||
```
|
||||
|
||||
**Kubernetes project:**
|
||||
```
|
||||
k8s/
|
||||
├── manifests/ # YAML definitions
|
||||
│ ├── deployment.yaml
|
||||
│ ├── service.yaml
|
||||
│ └── configmap.yaml
|
||||
├── helm/ # Helm charts
|
||||
│ └── [chart-name]/
|
||||
├── kustomization.yaml # Kustomize overlays
|
||||
└── secrets/ # Sealed/encrypted secrets
|
||||
```
|
||||
|
||||
## Context-Specific Workflows
|
||||
|
||||
### Working with JMP Server
|
||||
|
||||
For jmp-server Docker Compose stack:
|
||||
```bash
|
||||
# View all services
|
||||
docker compose ps
|
||||
|
||||
# Start specific service
|
||||
docker compose up -d gitea # Or: bookstack, traefik, etc.
|
||||
|
||||
# View logs for troubleshooting
|
||||
docker compose logs -f traefik
|
||||
docker compose logs -f gitea
|
||||
|
||||
# Backup database
|
||||
docker compose exec gitea-db pg_dump -U gitea gitea > gitea-backup.sql
|
||||
|
||||
# Restart service cleanly
|
||||
docker compose restart gitea
|
||||
```
|
||||
|
||||
## Actionable Execution
|
||||
|
||||
When troubleshooting or deploying:
|
||||
1. State the objective clearly
|
||||
2. Run targeted diagnostic commands
|
||||
3. Report findings with specific evidence (logs, output, metrics)
|
||||
4. Execute corrective actions with clear before/after confirmation
|
||||
5. Document any configuration changes for reproducibility
|
||||
Reference in New Issue
Block a user