Kubernetes CronJobs — Part 2: Parallelism

Alibaba Cloud
18 min readJul 17, 2019

By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

1) Suspend: True

You can suspend the overall running of a cron job by using suspend .

Below is the spec for a normal running cron job. ( We need a basic running cron job to suspend ).

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0

concurrencyPolicy: Replace

Start the cron job.

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

get cronjob output

kubectl get cronjobNAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob */1 * * * * False 0 47s 5m7s

Now SUSPEND is relevant.

All along this tutorial this was always False: job is not suspended.

Even right now we have a running cron job.

Add only the suspend: true line to myCronJob.yaml as shown below.

concurrencyPolicy ONLY shown so that your indentation is correct.

concurrencyPolicy: Replace
suspend: true

You use kubectl replace to replace a running Kubernetes object with a new one.

In this specific case we replace our running cron job with one that specifies : suspend: true

kubectl replace -f myCronJob.yamlcronjob.batch/mycronjob replaced

Investigate get cronjob output again:

kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * True 0 62s 5m22s

This time it shows SUSPEND True

6 minutes later it will show LAST SCHEDULE job was 6 minutes ago.

This is as expected since cron job is now SUSPENDED.

kubectl get cronjobNAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob */1 * * * * True 0 6m23s 10m

To get cron job running again we remove the suspend: true line from the spec file.

concurrencyPolicy should be the last line.

concurrencyPolicy: Replace

Replace running cron job with the latest spec from myCronJob.yaml

kubectl replace -f myCronJob.yaml
cronjob.batch/mycronjob replaced

Verify SUSPEND is lifted.

kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 0 6m23s 10m

Yes, SUSPEND is False.

47 seconds later … we see LAST SCHEDULE showing jobs get scheduled again.

kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 0 47s 11m

Investigate detail of describe cronjob/mycronjob

kubectl describe cronjob/mycronjobEvents:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 11m cronjob-controller Created job mycronjob-1548668040
Normal SawCompletedJob 11m cronjob-controller Saw completed job: mycronjob-1548668040
Normal SuccessfulCreate 10m cronjob-controller Created job mycronjob-1548668100
Normal SawCompletedJob 10m cronjob-controller Saw completed job: mycronjob-1548668100
Normal SuccessfulCreate 9m13s cronjob-controller Created job mycronjob-1548668160
Normal SawCompletedJob 9m3s cronjob-controller Saw completed job: mycronjob-1548668160
Normal SuccessfulCreate 8m13s cronjob-controller Created job mycronjob-1548668220
Normal SawCompletedJob 8m3s cronjob-controller Saw completed job: mycronjob-1548668220
Normal SuccessfulDelete 8m3s cronjob-controller Deleted job mycronjob-1548668040
Normal SuccessfulCreate 7m13s cronjob-controller Created job mycronjob-1548668280
Normal SawCompletedJob 7m3s cronjob-controller Saw completed job: mycronjob-1548668280
Normal SuccessfulDelete 7m2s cronjob-controller Deleted job mycronjob-1548668100

Normal SuccessfulCreate 52s cronjob-controller Created job mycronjob-1548668640
Normal SawCompletedJob 42s cronjob-controller Saw completed job: mycronjob-1548668640
Normal SuccessfulDelete 42s cronjob-controller Deleted job mycronjob-1548668160
Normal SuccessfulCreate 21s cronjob-controller Created job mycronjob-1548668700
Normal SawCompletedJob 11s cronjob-controller Saw completed job: mycronjob-1548668700
Normal SuccessfulDelete 11s cronjob-controller Deleted job mycronjob-1548668220

We see a clear gap during which no jobs for this cron job ran.

Unfortunately Kubernetes does not add an event line specifying :

… cron job suspended here

and

… cron job UNsuspended here.

We have to surmise that. It could have been Kubernetes was halted, server rebooted, or something else. Having detail that stated … a user-invoked suspend happened here … would have made it clear what happened.

Demo done, delete.

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

2) startingDeadlineSeconds

Detail Reference — https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline

If this field is not specified, the jobs have no deadline.

Summary: if a job misses its scheduled time by startingDeadlineSeconds it gets skipped.

The next scheduled time it will attempt to run again.

Below we have a cron job that should run every minute.

The work of this cron job is to sleep for 80 seconds.

We have concurrencyPolicy: Forbid specified. Two or more jobs may not run simultaneously.

startingDeadlineSeconds: 10 means it must start within 10 seconds each minute.

The Pod sleeps for 80 seconds means it will still be running a minute later. One minute later the next job cannot start ( concurrencyPolicy: Forbid ) because previous job still has 20 seconds of running time left. This second job will be skipped. This is what we attempt to observe below.

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 80']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0

concurrencyPolicy: Forbid
startingDeadlineSeconds: 10

Create cron job.

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

Investigate status a minute later:

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 0/1 57s 57s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 1/1 Running 0 58s

Previous job continues to run. ( No new job started: it should have run 15 seconds in this output. ) New job skipped as expected.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 0/1 65s 65s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 1/1 Running 0 65s

80 seconds later first job Completed

startingDeadlineSeconds: 10 prevents second job from starting now … it is 20 seconds past its planned start time … so it gets skipped.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 84s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 84s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 97s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 97s

Two minutes later new job starts — exactly on the minute time switchover.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 118s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 119s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548669600 1/1 82s 2m5s
mycronjob-1548669720 0/1 5s 5s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548669600-mzp2x 0/1 Completed 0 2m6s
mycronjob-1548669720-6dmrh 1/1 Running 0 6s

Output below as expected:

  • new job started every 2 minutes since every other minute there is still the previous job busy completing its last 20 seconds of sleep time.

SawCompletedJob every 80 seconds as each sleep 80 completes.

kubectl describe cronjob/mycronjobName:                       mycronjob
Schedule: */1 * * * *
Concurrency Policy: Forbid
Starting Deadline Seconds: 10s
Pod Template:
echo Job Pod is Running ; sleep 80
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m16s cronjob-controller Created job mycronjob-1548669600
Normal SawCompletedJob 46s cronjob-controller Saw completed job: mycronjob-1548669600
Normal SuccessfulCreate 16s cronjob-controller Created job mycronjob-1548669720

Delete …

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

3) Parallelism: 3 … concurrencyPolicy: Allow … sleep 5

Kubernetes is able to handle 3 main types of parallel jobs.

For more information https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs

This section deals with a cron job that periodically runs a job with a parallelism of three.

A basic JOB that runs with a parallelism of three will have 1 job running — with three Pods running.

A CRON JOB that runs with a parallelism of three will have 1 job running — with three Pods running … with the major difference : it runs periodically.

You have to carefully define such cron jobs so that too many jobs are not unintentionally running in parallel too frequently otherwise you have a CPU-overload mass of jobs.

This first example only runs 3 jobs in parallel — each running 5 seconds ( no problems ).

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 3

concurrencyPolicy: Allow

Create job.

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

Check progress:

3 pods running in parallel.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 0/1 of 3 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 1/1 Running 0 3s
mycronjob-1548745200-nslj8 1/1 Running 0 3s
mycronjob-1548745200-rhcnf 1/1 Running 0 3s

11 seconds later. Each slept for 5 seconds in parallel. All now completed.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 3/1 of 3 7s 11s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 0/1 Completed 0 11s
mycronjob-1548745200-nslj8 0/1 Completed 0 11s
mycronjob-1548745200-rhcnf 0/1 Completed 0 11s

A minute later. Second set of 3 Pods completed.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 3/1 of 3 7s 67s
mycronjob-1548745260 3/1 of 3 7s 7s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 0/1 Completed 0 67s
mycronjob-1548745200-nslj8 0/1 Completed 0 67s
mycronjob-1548745200-rhcnf 0/1 Completed 0 67s
mycronjob-1548745260-bk84s 0/1 Completed 0 7s
mycronjob-1548745260-rpv7h 0/1 Completed 0 7s
mycronjob-1548745260-z87mk 0/1 Completed 0 7s

Two minutes later. Third set of 3 Pods running.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745200 3/1 of 3 7s 2m5s
mycronjob-1548745260 3/1 of 3 7s 65s
mycronjob-1548745320 0/1 of 3 4s 4s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745200-2gg4s 0/1 Completed 0 2m5s
mycronjob-1548745200-nslj8 0/1 Completed 0 2m5s
mycronjob-1548745200-rhcnf 0/1 Completed 0 2m5s
mycronjob-1548745260-bk84s 0/1 Completed 0 65s
mycronjob-1548745260-rpv7h 0/1 Completed 0 65s
mycronjob-1548745260-z87mk 0/1 Completed 0 65s
mycronjob-1548745320-bk2mg 1/1 Running 0 4s
mycronjob-1548745320-fbg9v 1/1 Running 0 4s
mycronjob-1548745320-wpblf 1/1 Running 0 4s

No overlapped running jobs: simple.

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

4) Parallelism: 3 … completions: 6 … concurrencyPolicy: Allow … sleep 5

This time 6 Pods need to complete for job to be considered complete. completions: 6

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 5']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 3
completions: 6

concurrencyPolicy: Allow

Create:

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

Monitor: parallelism: 3 … 3 Pods start simultaneously.

NAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548745740 0/6 1s 1s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-9jb8w 0/1 ContainerCreating 0 1s
mycronjob-1548745740-q6jwn 0/1 ContainerCreating 0 1s
mycronjob-1548745740-w6tmg 0/1 ContainerCreating 0 1s

Seconds later, first set of 3 completed, second set of 3 running.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 3/6 8s 8s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 1/1 Running 0 2s
mycronjob-1548745740-9jb8w 0/1 Completed 0 8s
mycronjob-1548745740-f5qzk 1/1 Running 0 2s
mycronjob-1548745740-pkfn5 1/1 Running 0 2s
mycronjob-1548745740-q6jwn 0/1 Completed 0 8s
mycronjob-1548745740-w6tmg 0/1 Completed 0 8s

Seconds later … second set of 3 completed. completions: 6

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 17s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 12s
mycronjob-1548745740-9jb8w 0/1 Completed 0 18s
mycronjob-1548745740-f5qzk 0/1 Completed 0 12s
mycronjob-1548745740-pkfn5 0/1 Completed 0 12s
mycronjob-1548745740-q6jwn 0/1 Completed 0 18s
mycronjob-1548745740-w6tmg 0/1 Completed 0 18s

One minute later this cycle repeats.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 63s
mycronjob-1548745800 0/6 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 57s
mycronjob-1548745740-9jb8w 0/1 Completed 0 63s
mycronjob-1548745740-f5qzk 0/1 Completed 0 57s
mycronjob-1548745740-pkfn5 0/1 Completed 0 57s
mycronjob-1548745740-q6jwn 0/1 Completed 0 63s
mycronjob-1548745740-w6tmg 0/1 Completed 0 63s
mycronjob-1548745800-4bvgz 1/1 Running 0 3s
mycronjob-1548745800-csfr5 1/1 Running 0 3s
mycronjob-1548745800-qddtw 1/1 Running 0 3s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 67s
mycronjob-1548745800 3/6 7s 7s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 61s
mycronjob-1548745740-9jb8w 0/1 Completed 0 67s
mycronjob-1548745740-f5qzk 0/1 Completed 0 61s
mycronjob-1548745740-pkfn5 0/1 Completed 0 61s
mycronjob-1548745740-q6jwn 0/1 Completed 0 67s
mycronjob-1548745740-w6tmg 0/1 Completed 0 67s
mycronjob-1548745800-4bvgz 0/1 Completed 0 7s
mycronjob-1548745800-4mg4b 1/1 Running 0 1s
mycronjob-1548745800-csfr5 0/1 Completed 0 7s
mycronjob-1548745800-kl295 0/1 ContainerCreating 0 1s
mycronjob-1548745800-mw6d7 0/1 ContainerCreating 0 1s
mycronjob-1548745800-qddtw 0/1 Completed 0 7s
kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548745740 6/6 12s 75s
mycronjob-1548745800 6/6 12s 15s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548745740-4lkg9 0/1 Completed 0 69s
mycronjob-1548745740-9jb8w 0/1 Completed 0 75s
mycronjob-1548745740-f5qzk 0/1 Completed 0 69s
mycronjob-1548745740-pkfn5 0/1 Completed 0 69s
mycronjob-1548745740-q6jwn 0/1 Completed 0 75s
mycronjob-1548745740-w6tmg 0/1 Completed 0 75s
mycronjob-1548745800-4bvgz 0/1 Completed 0 15s
mycronjob-1548745800-4mg4b 0/1 Completed 0 9s
mycronjob-1548745800-csfr5 0/1 Completed 0 15s
mycronjob-1548745800-kl295 0/1 Completed 0 9s
mycronjob-1548745800-mw6d7 0/1 Completed 0 9s
mycronjob-1548745800-qddtw 0/1 Completed 0 15s

Demo complete, delete …

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

5) Parallelism: 3 … completions: 6 … concurrencyPolicy: Allow … sleep 120

This time 2 parallel Pods, 4 completions. Each Pod sleeps 120 seconds.

Jobs will overlap … concurrencyPolicy: Allow

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 2
completions: 4

concurrencyPolicy: Allow

Create:

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

2 jobs start simultaneously.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 0/4 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 1/1 Running 0 3s
mycronjob-1548747000-pv5f7 1/1 Running 0 3s

After 1 minute a second set of 2 Pods starting.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 0/4 65s 65s
mycronjob-1548747060 0/4 5s 5s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 1/1 Running 0 65s
mycronjob-1548747000-pv5f7 1/1 Running 0 65s
mycronjob-1548747060-98gfj 1/1 Running 0 5s
mycronjob-1548747060-ltlp4 1/1 Running 0 5s

Another minute later … third job started with 2 ContainerCreating Pods

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 0/4 2m 2m
mycronjob-1548747060 0/4 60s 60s
.... missed by milliseconds: mycronjob-1548747120 should have been listed here ....
.... its 2 Pods are listed below ...
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 1/1 Running 0 2m
mycronjob-1548747000-pv5f7 1/1 Running 0 2m
mycronjob-1548747060-98gfj 1/1 Running 0 60s
mycronjob-1548747060-ltlp4 1/1 Running 0 60s
mycronjob-1548747120-876jx 0/1 ContainerCreating 0 0s
mycronjob-1548747120-vpv8p 0/1 ContainerCreating 0 0s

Several minutes later result in the mess below.

See below … you have 3 sets of parallel jobs that started 3 independent minutes all running simultaneously now.

So instead of 2 parallel jobs, you now have 8

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747000 2/4 3m45s 3m45s
mycronjob-1548747060 2/4 2m45s 2m45s
mycronjob-1548747120 0/4 105s 105s
mycronjob-1548747180 0/4 45s 45s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747000-8kddx 0/1 Completed 0 3m45s
mycronjob-1548747000-cpt97 1/1 Running 0 104s
mycronjob-1548747000-dc8z5 1/1 Running 0 104s
mycronjob-1548747000-pv5f7 0/1 Completed 0 3m45s
mycronjob-1548747060-98gfj 0/1 Completed 0 2m45s
mycronjob-1548747060-jmkld 1/1 Running 0 44s
mycronjob-1548747060-khnng 1/1 Running 0 44s
mycronjob-1548747060-ltlp4 0/1 Completed 0 2m45s
mycronjob-1548747120-876jx 1/1 Running 0 105s
mycronjob-1548747120-vpv8p 1/1 Running 0 105s
mycronjob-1548747180-2kbpf 1/1 Running 0 45s
mycronjob-1548747180-rxgl8 1/1 Running 0 45s

TIP: Do not schedule long running cron jobs to run time-spaced too close apart.

Delete.

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

6) Parallelism: 3 … completions: 6 … concurrencyPolicy: Forbid … sleep 120

concurrencyPolicy: Forbid skips new jobs that try to run while previous job is still busy.

This fixes the problem we had in previous section.

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 2
completions: 4

concurrencyPolicy: Forbid

Create.

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

Monitor: 2 Pods started.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 0/4 5s 5s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 1/1 Running 0 5s
mycronjob-1548747900-dmstt 1/1 Running 0 5s

A minute later original Pods still running, new jobs and new Pods DO NOT start.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 0/4 67s 67s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 1/1 Running 0 67s
mycronjob-1548747900-dmstt 1/1 Running 0 67s

3 minutes later. Previous job is complete, new job can run ( no running Pods in the way )

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 2/4 3m15s 3m15s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 0/1 Completed 0 3m15s
mycronjob-1548747900-dmstt 0/1 Completed 0 3m15s
mycronjob-1548747900-mg5g2 1/1 Running 0 73s
mycronjob-1548747900-ztlgc 1/1 Running 0 73s

Another minute later 1548747900 series Pods Completed — new series running.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548747900 4/4 4m3s 4m14s
mycronjob-1548748140 0/4 4s 4s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548747900-bs69b 0/1 Completed 0 4m15s
mycronjob-1548747900-dmstt 0/1 Completed 0 4m15s
mycronjob-1548747900-mg5g2 0/1 Completed 0 2m13s
mycronjob-1548747900-ztlgc 0/1 Completed 0 2m13s
mycronjob-1548748140-49hdp 1/1 Running 0 5s
mycronjob-1548748140-rw56f 1/1 Running 0 5s

concurrencyPolicy: Forbid may be the solution if you have unintentional too many Pods running simultaneously.

Delete.

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

7) Parallelism: 3 … completions: 6 … concurrencyPolicy: Replace … sleep 120

concurrencyPolicy: Replace

From https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy

If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; sleep 120']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
parallelism: 2
completions: 4

concurrencyPolicy: Replace

Create cron job.

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

Investigate what happened after 12 minutes.

kubectl describe cronjob/mycronjobName:                       mycronjob
Schedule: */1 * * * *
Concurrency Policy: Replace
Parallelism: 2
Completions: 4
Pod Template:
Command:
echo Job Pod is Running ; sleep 120
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 12m cronjob-controller Created job mycronjob-1548748620
Normal SuccessfulDelete 11m cronjob-controller Deleted job mycronjob-1548748620

Normal SuccessfulCreate 11m cronjob-controller Created job mycronjob-1548748680
Normal SuccessfulDelete 10m cronjob-controller Deleted job mycronjob-1548748680

Normal SuccessfulCreate 10m cronjob-controller Created job mycronjob-1548748740
Normal SuccessfulDelete 9m27s cronjob-controller Deleted job mycronjob-1548748740

Normal SuccessfulCreate 9m27s cronjob-controller Created job mycronjob-1548748800
Normal SuccessfulDelete 8m27s cronjob-controller Deleted job mycronjob-1548748800
Normal SuccessfulCreate 8m27s cronjob-controller Created job mycronjob-1548748860
Normal SuccessfulDelete 7m27s cronjob-controller Deleted job mycronjob-1548748860
Normal SuccessfulCreate 7m27s cronjob-controller Created job mycronjob-1548748920
Normal SuccessfulDelete 6m26s cronjob-controller Deleted job mycronjob-1548748920
Normal SuccessfulCreate 6m26s cronjob-controller Created job mycronjob-1548748980
Normal SuccessfulDelete 5m26s cronjob-controller Deleted job mycronjob-1548748980
Normal SuccessfulCreate 5m26s cronjob-controller Created job mycronjob-1548749040
Normal SuccessfulDelete 4m26s cronjob-controller Deleted job mycronjob-1548749040
Normal SuccessfulCreate 4m26s cronjob-controller Created job mycronjob-1548749100
Normal SuccessfulDelete 3m26s cronjob-controller Deleted job mycronjob-1548749100
Normal SuccessfulCreate 25s (x4 over 3m26s) cronjob-controller (combined from similar events): Created job mycronjob-1548749340
Normal SuccessfulDelete 25s (x3 over 2m26s) cronjob-controller (combined from similar events): Deleted job mycronjob-1548749280

One new job started every minute.

Previous running job deleted every time a new job starts. This is concurrencyPolicy: Replace in action.

IMPORTANT: NO job completes. NO job ever runs for its full 120 sleep seconds.

Use concurrencyPolicy: Replace if you understand how it works and you need its feature: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run. ( Replace deletes previous job, Pod and its logs )

IMPORTANT: In this tutorial for this specific case NO job ever completes successfully. It ALWAYS gets replaced. However if you use replace in your environment it will only replace jobs when a previous one exists. That will probably be nearly never in your production environment. Understand replace and its implications. It may be wrong or perfect right for your case.

These commands only show latest running job and its Pods.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548749400 0/4 3s 3s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548749400-fcrrt 1/1 Running 0 3s
mycronjob-1548749400-n2wbs 1/1 Running 0 3s
kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
mycronjob */1 * * * * False 1 22s 13m

Delete cron job.

kubectl delete -f myCronJob.yamlcronjob.batch "mycronjob" deleted

8) backofflimit and Kubernetes Cron Jobs

I am impressed and agree with Kubernetes job and cron job functionality in general.

However when a cron job has problems that need the backofflimit functionality it leaves no trace evidence.

A long running job may sometimes experience intermittent and changing problems.

backofflimit deletes crashed jobs, their Pods and their logs. You are only left with the current running job and Pod. kubectl get cronjob does not even hint that this cronjob has problems.

kubectl describe shows misleading event messages: all seems OK, but only success events shown.

Let’s investigate.

From 
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#pod-backoff-failure-policy
> Pod Backoff failure policy> There are situations where you want to fail a Job after some amount of retries due to a logical error in configuration etc. To do so, set .spec.backoffLimit to specify the number of retries before considering a Job as failed. The back-off limit is set by default to 6. > Failed Pods associated with the Job are recreated by the Job controller with an exponential back-off delay (10s, 20s, 40s …) capped at six minutes. The back-off count is reset if no new failed Pods appear before the Job's next status check.

The cron job below should run every minute.

It exits immediately upon start with error exit code 1.

It has a backoffLimit of 2. It must retry twice to run in case of problems.

nano myCronJob.yamlapiVersion: batch/v1beta1
kind: CronJob
metadata:
name: mycronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycron-container
image: alpine
imagePullPolicy: IfNotPresent

command: ['sh', '-c', 'echo Job Pod is Running ; exit 1']

restartPolicy: OnFailure
terminationGracePeriodSeconds: 0
backoffLimit: 2

Create the cron job.

kubectl create -f myCronJob.yaml 

cronjob.batch/mycronjob created

13 seconds later the first Pod tried to start, got exit 1 error.

kubectl get job
NAME COMPLETIONS DURATION AGE
mycronjob-1548752820 0/1 13s 13s
kubectl get pods
NAME READY STATUS RESTARTS AGE
mycronjob-1548752820-7czts 0/1 CrashLoopBackOff 1 13s

However look at kubectl describe events below.

The event Saw completed job: makes it sound as if job completed successfully, but it did not.

There is no indication on exit 1 condition, and no indication of CrashLoopBackOff status.

kubectl describe cronjob/mycronjobName:                       mycronjob
Schedule: */1 * * * *
Pod Template:
Command:
echo Job Pod is Running ; exit 1
Active Jobs: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 23s cronjob-controller Created job mycronjob-1548752820
Normal SawCompletedJob 3s cronjob-controller Saw completed job: mycronjob-1548752820

A minute later. Nothing in output below hints at a problem with the cron job.

kubectl get cronjobsNAME        SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
mycronjob */1 * * * * False 0 55s 68s

The output below also does not hint at a problem.

It seems first job is running slow, 77 seconds and still busy … zero COMPLETIONS.

However the Pod of first job crashed in the first second.

kubectl get jobNAME                   COMPLETIONS   DURATION   AGE
mycronjob-1548752820 0/1 77s 77s
mycronjob-1548752880 0/1 17s 17s

If we look at kubectl describe we see:

kubectl describe cronjob/mycronjobName:                       mycronjobEvents:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m26s cronjob-controller Created job mycronjob-1548752820
Normal SawCompletedJob 2m6s cronjob-controller Saw completed job: mycronjob-1548752820
Normal SuccessfulCreate 86s cronjob-controller Created job mycronjob-1548752880
Normal SawCompletedJob 66s cronjob-controller Saw completed job: mycronjob-1548752880
Normal SuccessfulDelete 66s cronjob-controller Deleted job mycronjob-1548752820
Normal SuccessfulCreate 26s cronjob-controller Created job mycronjob-1548752940
Normal SawCompletedJob 6s cronjob-controller Saw completed job: mycronjob-1548752940
Normal SuccessfulDelete 6s cronjob-controller Deleted job mycronjob-1548752880

Last line above: first failed cron job is now deleted … logs gone.

describe output above SEEMS to show all is well, but it is not: no indication of CrashLoopBackOff status.

If you have an hourly cron job that has such problems you are left with little historic paper trail evidence of what happened.

I do not enjoy troubleshooting cron jobs that use backoffLimit and this behaviors.

Tip: write such cron job log information to a persistent volume and use that as your primary research source. Everything in one place and it is persistent. Plus you are in control of what information to write to the logs.

kubectl delete -f myCronJob.yaml 

cronjob.batch "mycronjob" deleted

9) Your Turn

These spec feature fields enable useful functionality configurations:

  • schedule frequency
  • startingDeadlineSeconds
  • concurrencyPolicy
  • parallelism
  • completions
  • restartPolicy

Standalone each field is easy to understand. Combining all these YAML spec fields lead to complex interactions and reactions, especially when combined with unexpected long running cron jobs.

Design your own simple tests to learn these features. Then design some complex tests where you test and learn their interactions.

The moment you are running 2 or more different cron jobs simultaneously in 2 or more terminal windows you will know … you are becoming an expert.

Original Source

https://www.alibabacloud.com/blog/kubernetes-cronjobs---part-2-parallelism_595022?spm=a2c41.13112074.0.0

--

--

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com