k8s Jobs, Helper Containers, and Native Sidecars
k8s Jobs that share a Pod with a long running helper container have a lifecycle problem that's easy to miss until things stop working.
A common setup, a script container that does the actual work, and a logging agent like Fluent Bit that tails a shared volume and ships logs somewhere useful. Both run as ordinary containers in the same Pod.
apiVersion: batch/v1
kind: Job
metadata:
name: log-collector-job
spec:
template:
spec:
restartPolicy: Never
containers:
- name: logging
image: fluent/fluent-bit:latest
# ...args...
volumeMounts:
- name: logs
mountPath: /var/log
- name: script
image: script-image
command: [sh, -c]
args:
- |
set -o pipefail
echo "Executing script"
/script | tee -a /var/log/logfile.log
echo "Done"
# Graceful shutdown of Istio sidecar
curl -fsI -X POST http://localhost:15020/quitquitquit
volumeMounts:
- name: logs
mountPath: /var/log
volumes:
- name: logs
emptyDir: {}This works, but there's a dependency hiding in the script container that's easy to overlook: that curl at the end.
k8s won't mark a Job complete until every container in the Pod has exited. Fluent Bit is a daemon, it has no reason to stop on its own. So the script was responsible for telling the Istio sidecar to shut down, and the sidecar stopping was what eventually caused Fluent Bit to stop too. Migrate to Istio Ambient Mode, no injected sidecar, delete that curl, and every Job hangs forever. The script finishes, the logs are all there, and the Pod just sits in Running doing nothing.
k8s has no concept of "helper" containers. It sees two containers and waits for both to exit.
The obvious fix, and why it's wrong
The most direct solution is to kill Fluent Bit from the script:
/script | tee -a /var/log/logfile.log
pkill -f fluent-bitWith shareProcessNamespace: true on the Pod, this works. But it puts the wrong responsibility in the wrong place.
The script now needs to know what other containers exist in the Pod, what they're called, and how to stop them. Rename the logging container, swap out the logging agent, or deploy to an environment with a different setup and the script breaks for reasons that have nothing to do with the script.
The problem isn't that Fluent Bit kept running. The problem is that k8s had no way to know it shouldn't.
Native sidecars
k8s v1.28 introduced a way to express exactly this intent, native sidecar containers, going stable in v1.29.
The idea is that some containers exist to support the workload, not to be the workload. They should start before the main containers, stay running while the work happens, and exit automatically once the workload containers finish without any coordination from the script.
The syntax looks a little unexpected at first. A native sidecar is declared inside initContainers, but with restartPolicy: Always. That specific combination tells the kubelet this isn't a one shot init container. Keep it running through the main phase, and terminate it cleanly when the ordinary containers are done.
initContainers:
- name: logging
image: fluent/fluent-bit:latest
restartPolicy: Always # this is what makes it a native sidecarHere's the full updated spec:
apiVersion: batch/v1
kind: Job
metadata:
name: log-collector-job
spec:
template:
spec:
restartPolicy: Never # applies to ordinary containers
initContainers:
- name: logging
image: fluent/fluent-bit:latest
restartPolicy: Always # THIS marks it as a native sidecar
volumeMounts:
- name: logs
mountPath: /var/log
# ...args / env...
containers:
- name: script
image: script-image
command: [sh, -c]
args:
- |
set -o pipefail
echo "Executing script"
/script | tee -a /var/log/logfile.log
echo "Done"
volumeMounts:
- name: logs
mountPath: /var/log
volumes:
- name: logs
emptyDir: {}The script finishes, k8s sees no ordinary containers left, terminates Fluent Bit, and marks the Job complete. Nothing in the script needs to know about Fluent Bit. No pkill. No curl.
You're not working around the problem, you're giving k8s enough information to handle the lifecycle correctly.
This works the same whether you're running classic Istio sidecar injection, Ambient Mode, or no service mesh at all. The spec expresses what you actually mean, and k8s does the rest.