299 lines
8.7 KiB
Markdown
299 lines
8.7 KiB
Markdown
---
|
|
name: container-infrastructure-ops
|
|
description: Maintains, troubleshoots, and optimizes containerized infrastructure using Docker, Docker Compose, Kubernetes, Helm, and CI/CD pipelines. Enables system stability, security, reproducibility, and clear technical execution. Use for deployment operations, container management, networking, storage, secrets management, monitoring, and infrastructure troubleshooting.
|
|
---
|
|
|
|
# Container Infrastructure Operations
|
|
|
|
Comprehensive skill for managing and maintaining software stacks hosted on containerized infrastructure.
|
|
|
|
## Core Capabilities
|
|
|
|
- **Docker & Docker Compose**: Service orchestration, container lifecycle, volume management, networking
|
|
- **Kubernetes & Helm**: Cluster operations, deployment manifests, package management, upgrades
|
|
- **CI/CD Pipelines**: GitLab CI, GitHub Actions, and runner configuration
|
|
- **Networking & Routing**: Reverse proxies (Traefik, nginx), TLS/HTTPS, service discovery
|
|
- **Storage & Data**: Volume mounting, backup/restore, database operations, data persistence
|
|
- **Security**: Secrets management, access control, network policies, RBAC
|
|
- **Monitoring & Logging**: Health checks, log aggregation, observability
|
|
- **Troubleshooting**: Container debugging, resource issues, log analysis, dependency resolution
|
|
|
|
## Operational Workflows
|
|
|
|
### 1. Service Startup & Deployment
|
|
|
|
**Docker Compose:**
|
|
```bash
|
|
# Start all services
|
|
docker compose up -d
|
|
|
|
# Start specific service
|
|
docker compose up -d [service_name]
|
|
|
|
# Build and start
|
|
docker compose up --build -d
|
|
|
|
# With environment file
|
|
docker compose --env-file .env up -d
|
|
```
|
|
|
|
**Kubernetes:**
|
|
```bash
|
|
# Apply manifest
|
|
kubectl apply -f deployment.yaml
|
|
|
|
# Rolling update
|
|
kubectl set image deployment/[name] [container]=[image]:[tag]
|
|
|
|
# Check rollout status
|
|
kubectl rollout status deployment/[name]
|
|
```
|
|
|
|
**Helm:**
|
|
```bash
|
|
# Install release
|
|
helm install [release-name] [chart] -f values.yaml
|
|
|
|
# Upgrade existing release
|
|
helm upgrade [release-name] [chart] -f values.yaml
|
|
|
|
# Rollback to previous version
|
|
helm rollback [release-name] [revision]
|
|
```
|
|
|
|
### 2. Service Inspection & Monitoring
|
|
|
|
**Docker Compose:**
|
|
```bash
|
|
# View running services
|
|
docker compose ps
|
|
|
|
# View logs (follow)
|
|
docker compose logs -f [service_name]
|
|
|
|
# View logs with time range
|
|
docker compose logs --since 10m [service_name]
|
|
|
|
# Inspect container stats
|
|
docker stats [container_id]
|
|
```
|
|
|
|
**Kubernetes:**
|
|
```bash
|
|
# List resources
|
|
kubectl get pods -n [namespace]
|
|
kubectl get svc -n [namespace]
|
|
|
|
# Describe resource (detailed info)
|
|
kubectl describe pod [pod_name] -n [namespace]
|
|
|
|
# View logs
|
|
kubectl logs [pod_name] -n [namespace]
|
|
kubectl logs -f [pod_name] -n [namespace] # Follow
|
|
|
|
# Watch resources in real-time
|
|
kubectl get pods -w -n [namespace]
|
|
```
|
|
|
|
### 3. Environment & Configuration Management
|
|
|
|
**Load environment variables:**
|
|
```bash
|
|
# From .env file
|
|
set -a
|
|
source .env
|
|
set +a
|
|
|
|
# Apply to specific command
|
|
env $(cat .env | xargs) docker compose up -d
|
|
```
|
|
|
|
**Manage secrets:**
|
|
```bash
|
|
# Docker Compose (from file)
|
|
docker secrets create [name] /path/to/secret
|
|
|
|
# Kubernetes
|
|
kubectl create secret generic [name] --from-file=key=/path/to/secret
|
|
kubectl create secret docker-registry [name] --docker-server=[url]
|
|
```
|
|
|
|
### 4. Troubleshooting Workflow
|
|
|
|
**Container health check:**
|
|
1. Verify container is running: `docker compose ps` or `kubectl get pods`
|
|
2. Check logs: `docker compose logs [service]` or `kubectl logs [pod]`
|
|
3. Inspect configuration: Check environment variables, mounted volumes, network connectivity
|
|
4. Test connectivity: `docker exec [container] curl [service]` or `kubectl exec [pod] -- curl [service]`
|
|
5. Resource analysis: `docker stats` or `kubectl top pods`
|
|
|
|
**Network troubleshooting:**
|
|
```bash
|
|
# Docker Compose
|
|
docker network ls
|
|
docker network inspect [network_name]
|
|
|
|
# Kubernetes
|
|
kubectl get networkpolicies -n [namespace]
|
|
kubectl describe networkpolicy [name] -n [namespace]
|
|
```
|
|
|
|
**Volume & storage issues:**
|
|
```bash
|
|
# Docker Compose
|
|
docker volume ls
|
|
docker volume inspect [volume_name]
|
|
|
|
# Kubernetes
|
|
kubectl get pv
|
|
kubectl get pvc -n [namespace]
|
|
kubectl describe pvc [name] -n [namespace]
|
|
```
|
|
|
|
### 5. Backup & Restore Operations
|
|
|
|
**Docker Compose volumes:**
|
|
```bash
|
|
# Backup volume
|
|
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar czf /backup/backup.tar.gz -C /data .
|
|
|
|
# Restore volume
|
|
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar xzf /backup/backup.tar.gz -C /data
|
|
```
|
|
|
|
**Database backup within containers:**
|
|
```bash
|
|
# PostgreSQL
|
|
docker compose exec [postgres_service] pg_dump -U [user] [db] > backup.sql
|
|
|
|
# MySQL/MariaDB
|
|
docker compose exec [mysql_service] mysqldump -u [user] -p [db] > backup.sql
|
|
```
|
|
|
|
### 6. Security & Access Control
|
|
|
|
**Docker security best practices:**
|
|
- Use read-only root filesystem: `read_only: true`
|
|
- Drop unnecessary capabilities: `cap_drop: [ALL]`
|
|
- Run as non-root user: `user: "1000:1000"`
|
|
- Use secrets for sensitive data (not environment variables)
|
|
|
|
**Kubernetes RBAC:**
|
|
```bash
|
|
# Create service account
|
|
kubectl create serviceaccount [name] -n [namespace]
|
|
|
|
# Bind role to account
|
|
kubectl create rolebinding [binding-name] --clusterrole=[role] --serviceaccount=[namespace]:[account]
|
|
```
|
|
|
|
## Debugging Strategies
|
|
|
|
**Container execution:**
|
|
```bash
|
|
# Docker Compose
|
|
docker compose exec [service] /bin/bash # Interactive shell
|
|
docker compose exec [service] ps aux # List processes
|
|
docker compose exec [service] env # View environment
|
|
|
|
# Kubernetes
|
|
kubectl exec -it [pod] -- /bin/bash
|
|
kubectl exec [pod] -- ps aux
|
|
```
|
|
|
|
**Log analysis:**
|
|
- Check application logs: `docker logs` or `kubectl logs`
|
|
- Check container startup logs: Look for early exit, missing dependencies, config errors
|
|
- Cross-reference with timestamps to correlate events across services
|
|
|
|
**Resource constraints:**
|
|
```bash
|
|
# Docker
|
|
docker inspect [container] | grep -A 10 Memory
|
|
|
|
# Kubernetes
|
|
kubectl top nodes
|
|
kubectl top pods -n [namespace]
|
|
```
|
|
|
|
## Configuration Best Practices
|
|
|
|
- **Immutable infrastructure**: Rebuild containers rather than modifying running instances
|
|
- **Health checks**: Define liveness and readiness probes
|
|
- **Resource limits**: Set CPU/memory requests and limits to prevent resource contention
|
|
- **Rolling updates**: Use rolling deployment strategies to maintain availability
|
|
- **Secrets separation**: Store secrets outside version control (use `.env`, K8s secrets, or secret managers)
|
|
- **Logging**: Aggregate logs centrally; avoid storing logs in containers
|
|
|
|
## Common Error Patterns
|
|
|
|
| Issue | Symptom | Troubleshooting |
|
|
|-------|---------|-----------------|
|
|
| Port conflict | `bind: address already in use` | Check existing process: `lsof -i :[port]`; Kill if needed |
|
|
| Missing dependency | Service fails to start | Check logs for missing service/network; Verify service startup order |
|
|
| Resource exhaustion | Slow/hanging containers | Check CPU/memory usage; Increase limits; Reduce replica count |
|
|
| Networking | Services can't communicate | Verify network name; Check firewall rules; Test DNS resolution |
|
|
| Volume mount | Permission denied in container | Verify mount path exists; Check file permissions; Confirm user ID |
|
|
| Config error | Parse/validation error at startup | Validate YAML syntax; Check environment variable substitution |
|
|
|
|
## File Structure Reference
|
|
|
|
**Docker Compose project:**
|
|
```
|
|
project/
|
|
├── docker-compose.yaml # Main orchestration
|
|
├── .env # Environment variables (secrets)
|
|
├── .env.example # Template (tracked in git)
|
|
├── config/ # Configuration files
|
|
│ ├── traefik.yml
|
|
│ └── app.config
|
|
└── data/ # Persistent volumes
|
|
├── db/
|
|
└── uploads/
|
|
```
|
|
|
|
**Kubernetes project:**
|
|
```
|
|
k8s/
|
|
├── manifests/ # YAML definitions
|
|
│ ├── deployment.yaml
|
|
│ ├── service.yaml
|
|
│ └── configmap.yaml
|
|
├── helm/ # Helm charts
|
|
│ └── [chart-name]/
|
|
├── kustomization.yaml # Kustomize overlays
|
|
└── secrets/ # Sealed/encrypted secrets
|
|
```
|
|
|
|
## Context-Specific Workflows
|
|
|
|
### Working with JMP Server
|
|
|
|
For jmp-server Docker Compose stack:
|
|
```bash
|
|
# View all services
|
|
docker compose ps
|
|
|
|
# Start specific service
|
|
docker compose up -d gitea # Or: bookstack, traefik, etc.
|
|
|
|
# View logs for troubleshooting
|
|
docker compose logs -f traefik
|
|
docker compose logs -f gitea
|
|
|
|
# Backup database
|
|
docker compose exec gitea-db pg_dump -U gitea gitea > gitea-backup.sql
|
|
|
|
# Restart service cleanly
|
|
docker compose restart gitea
|
|
```
|
|
|
|
## Actionable Execution
|
|
|
|
When troubleshooting or deploying:
|
|
1. State the objective clearly
|
|
2. Run targeted diagnostic commands
|
|
3. Report findings with specific evidence (logs, output, metrics)
|
|
4. Execute corrective actions with clear before/after confirmation
|
|
5. Document any configuration changes for reproducibility
|