---
name: container-infrastructure-ops
description: Maintains, troubleshoots, and optimizes containerized infrastructure using Docker, Docker Compose, Kubernetes, Helm, and CI/CD pipelines. Enables system stability, security, reproducibility, and clear technical execution. Use for deployment operations, container management, networking, storage, secrets management, monitoring, and infrastructure troubleshooting.
---

# Container Infrastructure Operations

Comprehensive skill for managing and maintaining software stacks hosted on containerized infrastructure.

## Core Capabilities

- **Docker & Docker Compose**: Service orchestration, container lifecycle, volume management, networking
- **Kubernetes & Helm**: Cluster operations, deployment manifests, package management, upgrades
- **CI/CD Pipelines**: GitLab CI, GitHub Actions, and runner configuration
- **Networking & Routing**: Reverse proxies (Traefik, nginx), TLS/HTTPS, service discovery
- **Storage & Data**: Volume mounting, backup/restore, database operations, data persistence
- **Security**: Secrets management, access control, network policies, RBAC
- **Monitoring & Logging**: Health checks, log aggregation, observability
- **Troubleshooting**: Container debugging, resource issues, log analysis, dependency resolution

## Operational Workflows

### 1. Service Startup & Deployment

**Docker Compose:**
```bash
# Start all services
docker compose up -d

# Start specific service
docker compose up -d [service_name]

# Build and start
docker compose up --build -d

# With environment file
docker compose --env-file .env up -d
```

**Kubernetes:**
```bash
# Apply manifest
kubectl apply -f deployment.yaml

# Rolling update
kubectl set image deployment/[name] [container]=[image]:[tag]

# Check rollout status
kubectl rollout status deployment/[name]
```

**Helm:**
```bash
# Install release
helm install [release-name] [chart] -f values.yaml

# Upgrade existing release
helm upgrade [release-name] [chart] -f values.yaml

# Rollback to previous version
helm rollback [release-name] [revision]
```

### 2. Service Inspection & Monitoring

**Docker Compose:**
```bash
# View running services
docker compose ps

# View logs (follow)
docker compose logs -f [service_name]

# View logs with time range
docker compose logs --since 10m [service_name]

# Inspect container stats
docker stats [container_id]
```

**Kubernetes:**
```bash
# List resources
kubectl get pods -n [namespace]
kubectl get svc -n [namespace]

# Describe resource (detailed info)
kubectl describe pod [pod_name] -n [namespace]

# View logs
kubectl logs [pod_name] -n [namespace]
kubectl logs -f [pod_name] -n [namespace]  # Follow

# Watch resources in real-time
kubectl get pods -w -n [namespace]
```

### 3. Environment & Configuration Management

**Load environment variables:**
```bash
# From .env file
set -a
source .env
set +a

# Apply to specific command
env $(cat .env | xargs) docker compose up -d
```

**Manage secrets:**
```bash
# Docker Compose (from file)
docker secrets create [name] /path/to/secret

# Kubernetes
kubectl create secret generic [name] --from-file=key=/path/to/secret
kubectl create secret docker-registry [name] --docker-server=[url]
```

### 4. Troubleshooting Workflow

**Container health check:**
1. Verify container is running: `docker compose ps` or `kubectl get pods`
2. Check logs: `docker compose logs [service]` or `kubectl logs [pod]`
3. Inspect configuration: Check environment variables, mounted volumes, network connectivity
4. Test connectivity: `docker exec [container] curl [service]` or `kubectl exec [pod] -- curl [service]`
5. Resource analysis: `docker stats` or `kubectl top pods`

**Network troubleshooting:**
```bash
# Docker Compose
docker network ls
docker network inspect [network_name]

# Kubernetes
kubectl get networkpolicies -n [namespace]
kubectl describe networkpolicy [name] -n [namespace]
```

**Volume & storage issues:**
```bash
# Docker Compose
docker volume ls
docker volume inspect [volume_name]

# Kubernetes
kubectl get pv
kubectl get pvc -n [namespace]
kubectl describe pvc [name] -n [namespace]
```

### 5. Backup & Restore Operations

**Docker Compose volumes:**
```bash
# Backup volume
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar czf /backup/backup.tar.gz -C /data .

# Restore volume
docker run --rm -v [volume]:/data -v $(pwd):/backup busybox tar xzf /backup/backup.tar.gz -C /data
```

**Database backup within containers:**
```bash
# PostgreSQL
docker compose exec [postgres_service] pg_dump -U [user] [db] > backup.sql

# MySQL/MariaDB
docker compose exec [mysql_service] mysqldump -u [user] -p [db] > backup.sql
```

### 6. Security & Access Control

**Docker security best practices:**
- Use read-only root filesystem: `read_only: true`
- Drop unnecessary capabilities: `cap_drop: [ALL]`
- Run as non-root user: `user: "1000:1000"`
- Use secrets for sensitive data (not environment variables)

**Kubernetes RBAC:**
```bash
# Create service account
kubectl create serviceaccount [name] -n [namespace]

# Bind role to account
kubectl create rolebinding [binding-name] --clusterrole=[role] --serviceaccount=[namespace]:[account]
```

## Debugging Strategies

**Container execution:**
```bash
# Docker Compose
docker compose exec [service] /bin/bash      # Interactive shell
docker compose exec [service] ps aux         # List processes
docker compose exec [service] env            # View environment

# Kubernetes
kubectl exec -it [pod] -- /bin/bash
kubectl exec [pod] -- ps aux
```

**Log analysis:**
- Check application logs: `docker logs` or `kubectl logs`
- Check container startup logs: Look for early exit, missing dependencies, config errors
- Cross-reference with timestamps to correlate events across services

**Resource constraints:**
```bash
# Docker
docker inspect [container] | grep -A 10 Memory

# Kubernetes
kubectl top nodes
kubectl top pods -n [namespace]
```

## Configuration Best Practices

- **Immutable infrastructure**: Rebuild containers rather than modifying running instances
- **Health checks**: Define liveness and readiness probes
- **Resource limits**: Set CPU/memory requests and limits to prevent resource contention
- **Rolling updates**: Use rolling deployment strategies to maintain availability
- **Secrets separation**: Store secrets outside version control (use `.env`, K8s secrets, or secret managers)
- **Logging**: Aggregate logs centrally; avoid storing logs in containers

## Common Error Patterns

| Issue | Symptom | Troubleshooting |
|-------|---------|-----------------|
| Port conflict | `bind: address already in use` | Check existing process: `lsof -i :[port]`; Kill if needed |
| Missing dependency | Service fails to start | Check logs for missing service/network; Verify service startup order |
| Resource exhaustion | Slow/hanging containers | Check CPU/memory usage; Increase limits; Reduce replica count |
| Networking | Services can't communicate | Verify network name; Check firewall rules; Test DNS resolution |
| Volume mount | Permission denied in container | Verify mount path exists; Check file permissions; Confirm user ID |
| Config error | Parse/validation error at startup | Validate YAML syntax; Check environment variable substitution |

## File Structure Reference

**Docker Compose project:**
```
project/
├── docker-compose.yaml      # Main orchestration
├── .env                      # Environment variables (secrets)
├── .env.example             # Template (tracked in git)
├── config/                  # Configuration files
│   ├── traefik.yml
│   └── app.config
└── data/                    # Persistent volumes
    ├── db/
    └── uploads/
```

**Kubernetes project:**
```
k8s/
├── manifests/               # YAML definitions
│   ├── deployment.yaml
│   ├── service.yaml
│   └── configmap.yaml
├── helm/                    # Helm charts
│   └── [chart-name]/
├── kustomization.yaml       # Kustomize overlays
└── secrets/                 # Sealed/encrypted secrets
```

## Context-Specific Workflows

### Working with JMP Server

For jmp-server Docker Compose stack:
```bash
# View all services
docker compose ps

# Start specific service
docker compose up -d gitea      # Or: bookstack, traefik, etc.

# View logs for troubleshooting
docker compose logs -f traefik
docker compose logs -f gitea

# Backup database
docker compose exec gitea-db pg_dump -U gitea gitea > gitea-backup.sql

# Restart service cleanly
docker compose restart gitea
```

## Actionable Execution

When troubleshooting or deploying:
1. State the objective clearly
2. Run targeted diagnostic commands
3. Report findings with specific evidence (logs, output, metrics)
4. Execute corrective actions with clear before/after confirmation
5. Document any configuration changes for reproducibility