Skip to content

Pod

General

kubectl get pods                #check status
kubectl get pods -o wide        #show more info such node
kubectl describe pod <pod-name> #check events
kubectl get event --field-selector involvedObject.name=<pod-name>
kubectl logs <pod-name>         #check logs
kubectl exec -it <pod-name> -n <namespace> -- bash      #connect to pod, only one container in pod
kubectl exec -it <pod-name> -c <container-name> -- bash #connect to pod, mutiple containers in pod

pending pods

kubectl get pods --field-selector=status.phase=Pending

delete failed pods

kubectl delete pods -A  --field-selector='status.phase=Failed'

forece delete

kubectl delete pod <pod-name> --grace-period=0 --force
kubectl delete pod <pod-name> --grace-period=0 --force -n <namespace>

pod stuck on terminating

  • https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status

  • https://github.com/kubernetes/kubernetes/issues/51835

    kubectl delete pod <pod-name> --grace-period=0  --force # if does not work, try
    kubectl patch pod <pod-name> -n <namespace> -p '{"metadata":{"finalizers":null}}' 
    

port forward

https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/

Use Port Forwarding to Access Applications in a Cluster.

kubectl port-forward pod/<pod-name> <local-port>:<pod-port>

pod cpu and memory usage

run into the pod

cat /sys/fs/cgroup/cpu/cpuacct.usage  #nanosecond
cat /sys/fs/cgroup/memory/memory.usage_in_bytes | awk '{ mem = $1 / 1024 / 1024 / 1024 ; print mem "GB" }'

check pod throttling rate

  • nr_periods: Total schedule period

  • nr_throttled: Total throttled period out of nr_periods

  • throttled_time: Total throttled time in ns

    #run into the pod
    cat /sys/fs/cgroup/cpu/cpu.stat
    

Kubectl run command

https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/#notes When you override the default Entrypoint and Cmd for a Container, these rules apply:

  • do not supply command or args, the defaults defined in the Docker image are used.

  • supply a command but no args, only the supplied command is used. The default EntryPoint and the default Cmd defined in the Docker image are ignored.

  • supply only args, the default Entrypoint defined in the Docker image is run with the args that you supplied.

  • supply a command and args, the default Entrypoint and the default Cmd defined in the Docker image are ignored. Your command is run with your args.

create a pod by passing env vars

kubectl run <pod-name> -n <namespace> \
  --image=<acr-name>.azurecr.io/dev/app:latest --env="PREFIX_UPPER_CASE_PARAM=xyz"

use another entrypoint and let the pod run so can get into the container

https://stackoverflow.com/questions/59248318/kubectl-run-command-vs-arguments

kubectl run <pod-name> -n <namespace> --image=<image-path> \
  --restart=Never -o yaml --dry-run -- /bin/sh -c "echo hello;sleep 3600"

sleep pod

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
  labels:
    app: ubuntu
spec:
  containers:
  - image: ubuntu
    command:
      - "sleep"
      - "604800"
    imagePullPolicy: IfNotPresent
    name: ubuntu
  restartPolicy: Always

temporally create a pod and delete it when it exits

option 1

kubectl run -it --rm <pod-name> --namespace=<namespace> --image=alpine -- bash

option 2: using a yaml file

kubectl apply -f <debug-pod>.yaml
kubectl exec -it <pod-name> --namespace=<namespace> -- bash
kubectl delete pod <pod-name> --namespace=<namespace>

yaml file

apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
  namespace: default
spec:
  containers:
  - name: debug-container
    image: alpine
    command: ["/bin/bash", "-c", "tail -f /dev/null"]
    volumeMounts:
    - name: my-volume
      mountPath: /path/in/container
  volumes:
  - name: my-volume
    hostPath:
      path: /path/on/host
running the tail -f /dev/null command to keep the container running.

Lifecycle

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#example-states

  • Pending

  • Running

  • Succeeded/Failed

Status

  • Completed: when the running process/app/container returns exit code 0.

  • Error: when return a non-zero exit code the status usually set to Error.

Pod and container failures

Container

A container in a Pod may fail for a number of reasons, such as

  • the process in it exited with a non-zero exit code, or

  • the container was killed for exceeding a memory limit, etc. If this happens, and the .spec.template.spec.restartPolicy = "OnFailure", then the Pod stays on the node, but the container is re-run.

Therefore, your program needs to handle the case when it is restarted locally, or else specify .spec.template.spec.restartPolicy = "Never".

Pod

An entire Pod can also fail, for a number of reasons, such as

  • when the pod is kicked off the node (node is upgraded, rebooted, deleted, etc.), or

  • if a container of the Pod fails and the .spec.template.spec.restartPolicy = "Never". When a Pod fails, then the Job controller starts a new Pod.

This means that your application needs to handle the case when it is restarted in a new pod. In particular, it needs to handle temporary files, locks, incomplete output and the like caused by previous runs.

Note that even if you specify .spec.parallelism = 1 and .spec.completions = 1 and .spec.template.spec.restartPolicy = "Never", the same program may sometimes be started twice.

If you do specify .spec.parallelism and .spec.completions both greater than 1, then there may be multiple pods running at once. Therefore, your pods must also be tolerant of concurrency.