Why my k8s job never finished and how I fixed it

I recently bumped into an issue while transitioning from Istio sidecar mode to Ambient Mode. I have a simple script that runs and writes to a log file and ships the logs with Fluent Bit.
The Job spec
apiVersion: batch/v1
kind: Job
metadata:
name: log-collector-job
spec:
template:
spec:
restartPolicy: Never
containers:
- name: logging
image: fluent/fluent-bit:latest
# ...args...
volumeMounts:
- name: logs
mountPath: /var/log
- name: script
image: script-image
command: [sh, -c]
args:
- |
set -o pipefail
echo "Executing script"
/script | tee -a /var/log/logfile.log
echo "Done"
# Graceful shutdown of Istio sidecar
curl -fsI -X POST http://localhost:15020/quitquitquit
volumeMounts:
- name: logs
mountPath: /var/log
volumes:
- name: logs
emptyDir: {}
This script has been working for ages. As seen above, I would typically use a curl command to gracefully shut down the Istio sidecar.
Then I migrated the namespace to Istio Ambient. “No sidecar now, right? Don’t need the curl.” I deleted the line.
From that moment every Job became… a zombie. The script would finish, CPU would nosedive, the logs were all there—and yet the
Pod just sat in Running
like time had frozen.
Without the explicit shutdown and without a sidecar to kill, the other always-on container (Fluent Bit) just kept running. Kubernetes won’t mark a Job complete until all non-ephemeral containers exit.
Fluent Bit is a daemon; it had no reason to stop. I had built an accidental zombie factory.
With no mechanism to end Fluent Bit, Kubernetes waits forever.
Enter Native Sidecars
Native Sidecars, introduced in v1.28 as a feature gate, formalize lifecycle intent for helper containers. They start before regular workload containers and—crucially—after all ordinary containers complete the kubelet terminates them so the Pod can finish.
Declaring Fluent Bit this way tells Kubernetes “this container supports the workload but shouldn’t keep the Pod alive once the work is done.”
The implementation is a little bit weird, a native sidecar is specified inside initContainers
but with restartPolicy: Always
.
That special combination promotes it from a one‑shot init to a managed sidecar that stays running during the main phase and is then
shut down automatically after the workload containers exit.
Updated Spec
apiVersion: batch/v1
kind: Job
metadata:
name: log-collector-job
spec:
template:
spec:
restartPolicy: Never # applies to ordinary containers
initContainers:
- name: logging
image: fluent/fluent-bit:latest
restartPolicy: Always # THIS marks it as a native sidecar
volumeMounts:
- name: logs
mountPath: /var/log
# ...args / env...
containers:
- name: script
image: script-image
command: [sh, -c]
args:
- |
set -o pipefail
echo "Executing script"
/script | tee -a /var/log/logfile.log
echo "Done"
volumeMounts:
- name: logs
mountPath: /var/log
volumes:
- name: logs
emptyDir: {}
Why not just kill Fluent Bit container?
Sure, I could have jammed in a pkill -f fluent-bit
within the script with a shareProcessNamespace: true
and call it a day
but that option felt like a regression in observability.
Native sidecars solve the lifecycle problem once, declaratively, and the solution works the same whether I’m using classic sidecar injection or Ambient Mode.
By adopting native sidecar semantics I gave Kubernetes enough information to end supporting processes once the script was done. The pod now finishes quickly and cleanly, no zombie linger.