First commit

This commit is contained in:
2026-01-10 23:31:29 +01:00
commit dbe5c461b5
61 changed files with 10695 additions and 0 deletions

View File

@@ -0,0 +1,298 @@
---
name: container-infrastructure-ops
description: Maintains, troubleshoots, and optimizes containerized infrastructure using Docker, Docker Compose, Kubernetes, Helm, and CI/CD pipelines. Enables system stability, security, reproducibility, and clear technical execution. Use for deployment operations, container management, networking, storage, secrets management, monitoring, and infrastructure troubleshooting.
---
# Container Infrastructure Operations
Comprehensive skill for managing and maintaining software stacks hosted on containerized infrastructure.
## Core Capabilities
- **Docker & Docker Compose**: Service orchestration, container lifecycle, volume management, networking
- **Kubernetes & Helm**: Cluster operations, deployment manifests, package management, upgrades
- **CI/CD Pipelines**: GitLab CI, GitHub Actions, and runner configuration
- **Networking & Routing**: Reverse proxies (Traefik, nginx), TLS/HTTPS, service discovery
- **Storage & Data**: Volume mounting, backup/restore, database operations, data persistence
- **Security**: Secrets management, access control, network policies, RBAC
- **Monitoring & Logging**: Health checks, log aggregation, observability
- **Troubleshooting**: Container debugging, resource issues, log analysis, dependency resolution
## Operational Workflows
### 1. Service Startup & Deployment
**Docker Compose:**
```bash
# Start all services
docker compose up -d
# Start specific service
docker compose up -d [service_name]
# Build and start
docker compose up --build -d
# With environment file
docker compose --env-file .env up -d
```
**Kubernetes:**
```bash
# Apply manifest
kubectl apply -f deployment.yaml
# Rolling update
kubectl set image deployment/[name] [container]=[image]:[tag]
# Check rollout status
kubectl rollout status deployment/[name]
```
**Helm:**
```bash
# Install release
helm install [release-name] [chart] -f values.yaml
# Upgrade existing release
helm upgrade [release-name] [chart] -f values.yaml
# Rollback to previous version
helm rollback [release-name] [revision]
```
### 2. Service Inspection & Monitoring
**Docker Compose:**
```bash
# View running services
docker compose ps
# View logs (follow)
docker compose logs -f [service_name]
# View logs with time range
docker compose logs --since 10m [service_name]
# Inspect container stats
docker stats [container_id]
```
**Kubernetes:**
```bash
# List resources
kubectl get pods -n [namespace]
kubectl get svc -n [namespace]
# Describe resource (detailed info)
kubectl describe pod [pod_name] -n [namespace]
# View logs
kubectl logs [pod_name] -n [namespace]
kubectl logs -f [pod_name] -n [namespace] # Follow
# Watch resources in real-time
kubectl get pods -w -n [namespace]
```
### 3. Environment & Configuration Management
**Load environment variables:**
```bash
# From .env file
set -a
source .env
set +a
# Apply to specific command
env $(cat .env | xargs) docker compose up -d
```
**Manage secrets:**
```bash
# Docker Compose (from file)
docker secrets create [name] /path/to/secret
# Kubernetes
kubectl create secret generic [name] --from-file=key=/path/to/secret
kubectl create secret docker-registry [name] --docker-server=[url]
```
### 4. Troubleshooting Workflow
**Container health check:**
1. Verify container is running: `docker compose ps` or `kubectl get pods`
2. Check logs: `docker compose logs [service]` or `kubectl logs [pod]`
3. Inspect configuration: Check environment variables, mounted volumes, network connectivity
4. Test connectivity: `docker exec [container] curl [service]` or `kubectl exec [pod] -- curl [service]`
5. Resource analysis: `docker stats` or `kubectl top pods`
**Network troubleshooting:**
```bash
# Docker Compose
docker network ls
docker network inspect [network_name]
# Kubernetes
kubectl get networkpolicies -n [namespace]
kubectl describe networkpolicy [name] -n [namespace]
```
**Volume & storage issues:**
```bash
# Docker Compose
docker volume ls
docker volume inspect [volume_name]
# Kubernetes
kubectl get pv
kubectl get pvc -n [namespace]
kubectl describe pvc [name] -n [namespace]
```
### 5. Backup & Restore Operations
**Docker Compose volumes:**
```bash
# Backup volume
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar czf /backup/backup.tar.gz -C /data .
# Restore volume
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar xzf /backup/backup.tar.gz -C /data
```
**Database backup within containers:**
```bash
# PostgreSQL
docker compose exec [postgres_service] pg_dump -U [user] [db] > backup.sql
# MySQL/MariaDB
docker compose exec [mysql_service] mysqldump -u [user] -p [db] > backup.sql
```
### 6. Security & Access Control
**Docker security best practices:**
- Use read-only root filesystem: `read_only: true`
- Drop unnecessary capabilities: `cap_drop: [ALL]`
- Run as non-root user: `user: "1000:1000"`
- Use secrets for sensitive data (not environment variables)
**Kubernetes RBAC:**
```bash
# Create service account
kubectl create serviceaccount [name] -n [namespace]
# Bind role to account
kubectl create rolebinding [binding-name] --clusterrole=[role] --serviceaccount=[namespace]:[account]
```
## Debugging Strategies
**Container execution:**
```bash
# Docker Compose
docker compose exec [service] /bin/bash # Interactive shell
docker compose exec [service] ps aux # List processes
docker compose exec [service] env # View environment
# Kubernetes
kubectl exec -it [pod] -- /bin/bash
kubectl exec [pod] -- ps aux
```
**Log analysis:**
- Check application logs: `docker logs` or `kubectl logs`
- Check container startup logs: Look for early exit, missing dependencies, config errors
- Cross-reference with timestamps to correlate events across services
**Resource constraints:**
```bash
# Docker
docker inspect [container] | grep -A 10 Memory
# Kubernetes
kubectl top nodes
kubectl top pods -n [namespace]
```
## Configuration Best Practices
- **Immutable infrastructure**: Rebuild containers rather than modifying running instances
- **Health checks**: Define liveness and readiness probes
- **Resource limits**: Set CPU/memory requests and limits to prevent resource contention
- **Rolling updates**: Use rolling deployment strategies to maintain availability
- **Secrets separation**: Store secrets outside version control (use `.env`, K8s secrets, or secret managers)
- **Logging**: Aggregate logs centrally; avoid storing logs in containers
## Common Error Patterns
| Issue | Symptom | Troubleshooting |
|-------|---------|-----------------|
| Port conflict | `bind: address already in use` | Check existing process: `lsof -i :[port]`; Kill if needed |
| Missing dependency | Service fails to start | Check logs for missing service/network; Verify service startup order |
| Resource exhaustion | Slow/hanging containers | Check CPU/memory usage; Increase limits; Reduce replica count |
| Networking | Services can't communicate | Verify network name; Check firewall rules; Test DNS resolution |
| Volume mount | Permission denied in container | Verify mount path exists; Check file permissions; Confirm user ID |
| Config error | Parse/validation error at startup | Validate YAML syntax; Check environment variable substitution |
## File Structure Reference
**Docker Compose project:**
```
project/
├── docker-compose.yaml # Main orchestration
├── .env # Environment variables (secrets)
├── .env.example # Template (tracked in git)
├── config/ # Configuration files
│ ├── traefik.yml
│ └── app.config
└── data/ # Persistent volumes
├── db/
└── uploads/
```
**Kubernetes project:**
```
k8s/
├── manifests/ # YAML definitions
│ ├── deployment.yaml
│ ├── service.yaml
│ └── configmap.yaml
├── helm/ # Helm charts
│ └── [chart-name]/
├── kustomization.yaml # Kustomize overlays
└── secrets/ # Sealed/encrypted secrets
```
## Context-Specific Workflows
### Working with JMP Server
For jmp-server Docker Compose stack:
```bash
# View all services
docker compose ps
# Start specific service
docker compose up -d gitea # Or: bookstack, traefik, etc.
# View logs for troubleshooting
docker compose logs -f traefik
docker compose logs -f gitea
# Backup database
docker compose exec gitea-db pg_dump -U gitea gitea > gitea-backup.sql
# Restart service cleanly
docker compose restart gitea
```
## Actionable Execution
When troubleshooting or deploying:
1. State the objective clearly
2. Run targeted diagnostic commands
3. Report findings with specific evidence (logs, output, metrics)
4. Execute corrective actions with clear before/after confirmation
5. Document any configuration changes for reproducibility