Why my k8s job never finished and how I fixed it

Kubernetes Native Jobs

I recently bumped into an issue while transitioning from Istio sidecar mode to Ambient Mode. I have a simple script that runs and writes to a log file and ships the logs with Fluent Bit.

The Job spec

apiVersion: batch/v1
kind: Job
metadata:
  name: log-collector-job
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: logging
          image: fluent/fluent-bit:latest
          # ...args...
          volumeMounts:
            - name: logs
              mountPath: /var/log
        - name: script
          image: script-image
          command: [sh, -c]
          args:
            - |
              set -o pipefail
              echo "Executing script"
              /script | tee -a /var/log/logfile.log
              echo "Done"
              # Graceful shutdown of Istio sidecar
              curl -fsI -X POST http://localhost:15020/quitquitquit
          volumeMounts:
            - name: logs
              mountPath: /var/log
      volumes:
        - name: logs
          emptyDir: {}

This script has been working for ages. As seen above, I would typically use a curl command to gracefully shut down the Istio sidecar.

Then I migrated the namespace to Istio Ambient. “No sidecar now, right? Don’t need the curl.” I deleted the line.

From that moment every Job became… a zombie. The script would finish, CPU would nosedive, the logs were all there—and yet the Pod just sat in Running like time had frozen.

Without the explicit shutdown and without a sidecar to kill, the other always-on container (Fluent Bit) just kept running. Kubernetes won’t mark a Job complete until all non-ephemeral containers exit.

Fluent Bit is a daemon; it had no reason to stop. I had built an accidental zombie factory.

With no mechanism to end Fluent Bit, Kubernetes waits forever.

Enter Native Sidecars

Native Sidecars, introduced in v1.28 as a feature gate, formalize lifecycle intent for helper containers. They start before regular workload containers and—crucially—after all ordinary containers complete the kubelet terminates them so the Pod can finish.

Declaring Fluent Bit this way tells Kubernetes “this container supports the workload but shouldn’t keep the Pod alive once the work is done.”

The implementation is a little bit weird, a native sidecar is specified inside initContainers but with restartPolicy: Always. That special combination promotes it from a one‑shot init to a managed sidecar that stays running during the main phase and is then shut down automatically after the workload containers exit.

Updated Spec

apiVersion: batch/v1
kind: Job
metadata:
  name: log-collector-job
spec:
  template:
    spec:
      restartPolicy: Never # applies to ordinary containers
      initContainers:
        - name: logging
          image: fluent/fluent-bit:latest
          restartPolicy: Always # THIS marks it as a native sidecar
          volumeMounts:
            - name: logs
              mountPath: /var/log
          # ...args / env...
      containers:
        - name: script
          image: script-image
          command: [sh, -c]
          args:
            - |
              set -o pipefail
              echo "Executing script"
              /script | tee -a /var/log/logfile.log
              echo "Done"
          volumeMounts:
            - name: logs
              mountPath: /var/log
      volumes:
        - name: logs
          emptyDir: {}

Why not just kill Fluent Bit container?

Sure, I could have jammed in a pkill -f fluent-bit within the script with a shareProcessNamespace: true and call it a day but that option felt like a regression in observability.

Native sidecars solve the lifecycle problem once, declaratively, and the solution works the same whether I’m using classic sidecar injection or Ambient Mode.

By adopting native sidecar semantics I gave Kubernetes enough information to end supporting processes once the script was done. The pod now finishes quickly and cleanly, no zombie linger.

More materials

  1. Introducing native sidecar containers.
  2. Kubernetes Native Sidecars in Istio.
Why my k8s job never finished and how I fixed it - Nicholas