Files

Michele Rossi dbe5c461b5 First commit

2026-01-10 23:34:39 +01:00

8.7 KiB

Raw Blame History

name, description

name	description
container-infrastructure-ops	Maintains, troubleshoots, and optimizes containerized infrastructure using Docker, Docker Compose, Kubernetes, Helm, and CI/CD pipelines. Enables system stability, security, reproducibility, and clear technical execution. Use for deployment operations, container management, networking, storage, secrets management, monitoring, and infrastructure troubleshooting.

Container Infrastructure Operations

Comprehensive skill for managing and maintaining software stacks hosted on containerized infrastructure.

Core Capabilities

Docker & Docker Compose: Service orchestration, container lifecycle, volume management, networking
Kubernetes & Helm: Cluster operations, deployment manifests, package management, upgrades
CI/CD Pipelines: GitLab CI, GitHub Actions, and runner configuration
Networking & Routing: Reverse proxies (Traefik, nginx), TLS/HTTPS, service discovery
Storage & Data: Volume mounting, backup/restore, database operations, data persistence
Security: Secrets management, access control, network policies, RBAC
Monitoring & Logging: Health checks, log aggregation, observability
Troubleshooting: Container debugging, resource issues, log analysis, dependency resolution

Operational Workflows

1. Service Startup & Deployment

Docker Compose:

# Start all services
docker compose up -d

# Start specific service
docker compose up -d [service_name]

# Build and start
docker compose up --build -d

# With environment file
docker compose --env-file .env up -d

Kubernetes:

# Apply manifest
kubectl apply -f deployment.yaml

# Rolling update
kubectl set image deployment/[name] [container]=[image]:[tag]

# Check rollout status
kubectl rollout status deployment/[name]

Helm:

# Install release
helm install [release-name] [chart] -f values.yaml

# Upgrade existing release
helm upgrade [release-name] [chart] -f values.yaml

# Rollback to previous version
helm rollback [release-name] [revision]

2. Service Inspection & Monitoring

Docker Compose:

# View running services
docker compose ps

# View logs (follow)
docker compose logs -f [service_name]

# View logs with time range
docker compose logs --since 10m [service_name]

# Inspect container stats
docker stats [container_id]

Kubernetes:

# List resources
kubectl get pods -n [namespace]
kubectl get svc -n [namespace]

# Describe resource (detailed info)
kubectl describe pod [pod_name] -n [namespace]

# View logs
kubectl logs [pod_name] -n [namespace]
kubectl logs -f [pod_name] -n [namespace]  # Follow

# Watch resources in real-time
kubectl get pods -w -n [namespace]

3. Environment & Configuration Management

Load environment variables:

# From .env file
set -a
source .env
set +a

# Apply to specific command
env $(cat .env | xargs) docker compose up -d

Manage secrets:

# Docker Compose (from file)
docker secrets create [name] /path/to/secret

# Kubernetes
kubectl create secret generic [name] --from-file=key=/path/to/secret
kubectl create secret docker-registry [name] --docker-server=[url]

4. Troubleshooting Workflow

Container health check:

Verify container is running: docker compose ps or kubectl get pods
Check logs: docker compose logs [service] or kubectl logs [pod]
Inspect configuration: Check environment variables, mounted volumes, network connectivity
Test connectivity: docker exec [container] curl [service] or kubectl exec [pod] -- curl [service]
Resource analysis: docker stats or kubectl top pods

Network troubleshooting:

# Docker Compose
docker network ls
docker network inspect [network_name]

# Kubernetes
kubectl get networkpolicies -n [namespace]
kubectl describe networkpolicy [name] -n [namespace]

Volume & storage issues:

# Docker Compose
docker volume ls
docker volume inspect [volume_name]

# Kubernetes
kubectl get pv
kubectl get pvc -n [namespace]
kubectl describe pvc [name] -n [namespace]

5. Backup & Restore Operations

Docker Compose volumes:

# Backup volume
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar czf /backup/backup.tar.gz -C /data .

# Restore volume
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar xzf /backup/backup.tar.gz -C /data

Database backup within containers:

# PostgreSQL
docker compose exec [postgres_service] pg_dump -U [user] [db] > backup.sql

# MySQL/MariaDB
docker compose exec [mysql_service] mysqldump -u [user] -p [db] > backup.sql

6. Security & Access Control

Docker security best practices:

Use read-only root filesystem: read_only: true
Drop unnecessary capabilities: cap_drop: [ALL]
Run as non-root user: user: "1000:1000"
Use secrets for sensitive data (not environment variables)

Kubernetes RBAC:

# Create service account
kubectl create serviceaccount [name] -n [namespace]

# Bind role to account
kubectl create rolebinding [binding-name] --clusterrole=[role] --serviceaccount=[namespace]:[account]

Debugging Strategies

Container execution:

# Docker Compose
docker compose exec [service] /bin/bash      # Interactive shell
docker compose exec [service] ps aux         # List processes
docker compose exec [service] env            # View environment

# Kubernetes
kubectl exec -it [pod] -- /bin/bash
kubectl exec [pod] -- ps aux

Log analysis:

Check application logs: docker logs or kubectl logs
Check container startup logs: Look for early exit, missing dependencies, config errors
Cross-reference with timestamps to correlate events across services

Resource constraints:

# Docker
docker inspect [container] | grep -A 10 Memory

# Kubernetes
kubectl top nodes
kubectl top pods -n [namespace]

Configuration Best Practices

Immutable infrastructure: Rebuild containers rather than modifying running instances
Health checks: Define liveness and readiness probes
Resource limits: Set CPU/memory requests and limits to prevent resource contention
Rolling updates: Use rolling deployment strategies to maintain availability
Secrets separation: Store secrets outside version control (use .env, K8s secrets, or secret managers)
Logging: Aggregate logs centrally; avoid storing logs in containers

Common Error Patterns

Issue	Symptom	Troubleshooting
Port conflict	`bind: address already in use`	Check existing process: `lsof -i :[port]`; Kill if needed
Missing dependency	Service fails to start	Check logs for missing service/network; Verify service startup order
Resource exhaustion	Slow/hanging containers	Check CPU/memory usage; Increase limits; Reduce replica count
Networking	Services can't communicate	Verify network name; Check firewall rules; Test DNS resolution
Volume mount	Permission denied in container	Verify mount path exists; Check file permissions; Confirm user ID
Config error	Parse/validation error at startup	Validate YAML syntax; Check environment variable substitution

File Structure Reference

Docker Compose project:

project/
├── docker-compose.yaml      # Main orchestration
├── .env                      # Environment variables (secrets)
├── .env.example             # Template (tracked in git)
├── config/                  # Configuration files
│   ├── traefik.yml
│   └── app.config
└── data/                    # Persistent volumes
    ├── db/
    └── uploads/

Kubernetes project:

k8s/
├── manifests/               # YAML definitions
│   ├── deployment.yaml
│   ├── service.yaml
│   └── configmap.yaml
├── helm/                    # Helm charts
│   └── [chart-name]/
├── kustomization.yaml       # Kustomize overlays
└── secrets/                 # Sealed/encrypted secrets

Context-Specific Workflows

Working with JMP Server

For jmp-server Docker Compose stack:

# View all services
docker compose ps

# Start specific service
docker compose up -d gitea      # Or: bookstack, traefik, etc.

# View logs for troubleshooting
docker compose logs -f traefik
docker compose logs -f gitea

# Backup database
docker compose exec gitea-db pg_dump -U gitea gitea > gitea-backup.sql

# Restart service cleanly
docker compose restart gitea

Actionable Execution

When troubleshooting or deploying:

State the objective clearly
Run targeted diagnostic commands
Report findings with specific evidence (logs, output, metrics)
Execute corrective actions with clear before/after confirmation
Document any configuration changes for reproducibility

8.7 KiB Raw Blame History