Kubernetes CronJobs — Part 2: Parallelism

By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud’s incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

1) Suspend: True

You can suspend the overall running of a cron job by using suspend .

Below is the spec for a normal running cron job. ( We need a basic running cron job to suspend ).

Start the cron job.

get cronjob output

Now SUSPEND is relevant.

All along this tutorial this was always False: job is not suspended.

Even right now we have a running cron job.

Add only the suspend: true line to myCronJob.yaml as shown below.

concurrencyPolicy ONLY shown so that your indentation is correct.

You use kubectl replace to replace a running Kubernetes object with a new one.

In this specific case we replace our running cron job with one that specifies : suspend: true

Investigate get cronjob output again:

This time it shows SUSPEND True

6 minutes later it will show LAST SCHEDULE job was 6 minutes ago.

This is as expected since cron job is now SUSPENDED.

To get cron job running again we remove the suspend: true line from the spec file.

concurrencyPolicy should be the last line.

Replace running cron job with the latest spec from myCronJob.yaml

Verify SUSPEND is lifted.

Yes, SUSPEND is False.

47 seconds later … we see LAST SCHEDULE showing jobs get scheduled again.

Investigate detail of describe cronjob/mycronjob

We see a clear gap during which no jobs for this cron job ran.

Unfortunately Kubernetes does not add an event line specifying :

… cron job suspended here

and

… cron job UNsuspended here.

We have to surmise that. It could have been Kubernetes was halted, server rebooted, or something else. Having detail that stated … a user-invoked suspend happened here … would have made it clear what happened.

Demo done, delete.

2) startingDeadlineSeconds

Detail Reference — https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline

If this field is not specified, the jobs have no deadline.

Summary: if a job misses its scheduled time by startingDeadlineSeconds it gets skipped.

The next scheduled time it will attempt to run again.

Below we have a cron job that should run every minute.

The work of this cron job is to sleep for 80 seconds.

We have concurrencyPolicy: Forbid specified. Two or more jobs may not run simultaneously.

startingDeadlineSeconds: 10 means it must start within 10 seconds each minute.

The Pod sleeps for 80 seconds means it will still be running a minute later. One minute later the next job cannot start ( concurrencyPolicy: Forbid ) because previous job still has 20 seconds of running time left. This second job will be skipped. This is what we attempt to observe below.

Create cron job.

Investigate status a minute later:

Previous job continues to run. ( No new job started: it should have run 15 seconds in this output. ) New job skipped as expected.

80 seconds later first job Completed

startingDeadlineSeconds: 10 prevents second job from starting now … it is 20 seconds past its planned start time … so it gets skipped.

Two minutes later new job starts — exactly on the minute time switchover.

Output below as expected:

  • new job started every 2 minutes since every other minute there is still the previous job busy completing its last 20 seconds of sleep time.

SawCompletedJob every 80 seconds as each sleep 80 completes.

Delete …

3) Parallelism: 3 … concurrencyPolicy: Allow … sleep 5

Kubernetes is able to handle 3 main types of parallel jobs.

For more information https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#parallel-jobs

This section deals with a cron job that periodically runs a job with a parallelism of three.

A basic JOB that runs with a parallelism of three will have 1 job running — with three Pods running.

A CRON JOB that runs with a parallelism of three will have 1 job running — with three Pods running … with the major difference : it runs periodically.

You have to carefully define such cron jobs so that too many jobs are not unintentionally running in parallel too frequently otherwise you have a CPU-overload mass of jobs.

This first example only runs 3 jobs in parallel — each running 5 seconds ( no problems ).

Create job.

Check progress:

3 pods running in parallel.

11 seconds later. Each slept for 5 seconds in parallel. All now completed.

A minute later. Second set of 3 Pods completed.

Two minutes later. Third set of 3 Pods running.

No overlapped running jobs: simple.

4) Parallelism: 3 … completions: 6 … concurrencyPolicy: Allow … sleep 5

This time 6 Pods need to complete for job to be considered complete. completions: 6

Create:

Monitor: parallelism: 3 … 3 Pods start simultaneously.

Seconds later, first set of 3 completed, second set of 3 running.

Seconds later … second set of 3 completed. completions: 6

One minute later this cycle repeats.

Demo complete, delete …

5) Parallelism: 3 … completions: 6 … concurrencyPolicy: Allow … sleep 120

This time 2 parallel Pods, 4 completions. Each Pod sleeps 120 seconds.

Jobs will overlap … concurrencyPolicy: Allow

Create:

2 jobs start simultaneously.

After 1 minute a second set of 2 Pods starting.

Another minute later … third job started with 2 ContainerCreating Pods

Several minutes later result in the mess below.

See below … you have 3 sets of parallel jobs that started 3 independent minutes all running simultaneously now.

So instead of 2 parallel jobs, you now have 8

TIP: Do not schedule long running cron jobs to run time-spaced too close apart.

Delete.

6) Parallelism: 3 … completions: 6 … concurrencyPolicy: Forbid … sleep 120

concurrencyPolicy: Forbid skips new jobs that try to run while previous job is still busy.

This fixes the problem we had in previous section.

Create.

Monitor: 2 Pods started.

A minute later original Pods still running, new jobs and new Pods DO NOT start.

3 minutes later. Previous job is complete, new job can run ( no running Pods in the way )

Another minute later 1548747900 series Pods Completed — new series running.

concurrencyPolicy: Forbid may be the solution if you have unintentional too many Pods running simultaneously.

Delete.

7) Parallelism: 3 … completions: 6 … concurrencyPolicy: Replace … sleep 120

concurrencyPolicy: Replace

From https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy

If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run

Create cron job.

Investigate what happened after 12 minutes.

One new job started every minute.

Previous running job deleted every time a new job starts. This is concurrencyPolicy: Replace in action.

IMPORTANT: NO job completes. NO job ever runs for its full 120 sleep seconds.

Use concurrencyPolicy: Replace if you understand how it works and you need its feature: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run. ( Replace deletes previous job, Pod and its logs )

IMPORTANT: In this tutorial for this specific case NO job ever completes successfully. It ALWAYS gets replaced. However if you use replace in your environment it will only replace jobs when a previous one exists. That will probably be nearly never in your production environment. Understand replace and its implications. It may be wrong or perfect right for your case.

These commands only show latest running job and its Pods.

Delete cron job.

8) backofflimit and Kubernetes Cron Jobs

I am impressed and agree with Kubernetes job and cron job functionality in general.

However when a cron job has problems that need the backofflimit functionality it leaves no trace evidence.

A long running job may sometimes experience intermittent and changing problems.

backofflimit deletes crashed jobs, their Pods and their logs. You are only left with the current running job and Pod. kubectl get cronjob does not even hint that this cronjob has problems.

kubectl describe shows misleading event messages: all seems OK, but only success events shown.

Let’s investigate.

The cron job below should run every minute.

It exits immediately upon start with error exit code 1.

It has a backoffLimit of 2. It must retry twice to run in case of problems.

Create the cron job.

13 seconds later the first Pod tried to start, got exit 1 error.

However look at kubectl describe events below.

The event Saw completed job: makes it sound as if job completed successfully, but it did not.

There is no indication on exit 1 condition, and no indication of CrashLoopBackOff status.

A minute later. Nothing in output below hints at a problem with the cron job.

The output below also does not hint at a problem.

It seems first job is running slow, 77 seconds and still busy … zero COMPLETIONS.

However the Pod of first job crashed in the first second.

If we look at kubectl describe we see:

Last line above: first failed cron job is now deleted … logs gone.

describe output above SEEMS to show all is well, but it is not: no indication of CrashLoopBackOff status.

If you have an hourly cron job that has such problems you are left with little historic paper trail evidence of what happened.

I do not enjoy troubleshooting cron jobs that use backoffLimit and this behaviors.

Tip: write such cron job log information to a persistent volume and use that as your primary research source. Everything in one place and it is persistent. Plus you are in control of what information to write to the logs.

9) Your Turn

These spec feature fields enable useful functionality configurations:

  • schedule frequency
  • startingDeadlineSeconds
  • concurrencyPolicy
  • parallelism
  • completions
  • restartPolicy

Standalone each field is easy to understand. Combining all these YAML spec fields lead to complex interactions and reactions, especially when combined with unexpected long running cron jobs.

Design your own simple tests to learn these features. Then design some complex tests where you test and learn their interactions.

The moment you are running 2 or more different cron jobs simultaneously in 2 or more terminal windows you will know … you are becoming an expert.

Original Source

https://www.alibabacloud.com/blog/kubernetes-cronjobs---part-2-parallelism_595022?spm=a2c41.13112074.0.0

Written by

Follow me to keep abreast with the latest technology news, industry insights, and developer trends.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store