Bash Job Control
Job Control is one of the more advanced features of Bash, and one, until recently, I hadn’t taken time to learn properly. My general philosophy with scripting has been Python when I can, Bash when I must, to the point where for years I never wrote any Bash.
Taking the time to learn Bash better has always given rise to mixed feelings in me — if something is complicated enough that it can’t be done in rudimentary Bash, then it probably shouldn’t be done in Bash to begin with. What’s the point of investing time in learning the advanced features of something that’s best avoided altogether?
However, Bash is unsettlingly pragmatic. More often than not, I’ve found myself in situations where I’ve realized that it’d just be easier and faster to do something in Bash than in Python. So I decided to become conversant with the parts of Bash I’m not terribly familiar with — job control being one of them.
Foreground and Background Processes
I expect most folks to be aware of foreground and background processes, but it doesn’t hurt to revisit the topic.
In Bash, apipeline
is a sequence of one or more commands
separated by one of the control operators
|
or |&
. Each command in a pipeline
is executed in its own subshell, which is a separate process from the shell process. These processes are, by default, started in the foreground, meaning once these processes begin execution, the user can’t interact with the shell until the process completes or changes state.
It’s also possible to run a process in the background. Background processes don’t restrict access to the shell but execute in the background. They return control to the shell immediately upon start. Any command can be started in the background by appending an &
to it.
In the example above, the function foo
is started in the background. The script exits immediately, while the function executes in the background.
Jobs vs Processes?
Builtins like kill
, disown
and wait
operate on both processes
and jobs
. However, a job
isn’t quite the same as a process
.
A job
is something that’s tracked by the shell. The shell maintains a table of currently executing background processes and processes that have been suspended.
If I suspend a running emacs process with C-z
, and then type jobs -l
on the terminal (the -l
option to the jobs
builtin prints the pid
of the job), I will see:
~/copyconstruct@bailey: jobs -l
[4]+ 38992 Suspended: 18 emacs .
However, if I open a new shell and type jobs -l
, I wouldn't see emacs being listed as a suspended job. This is because the new shell isn’t tracking the suspended emacs process. However, the new shell is still aware of the process 38992
, since a process is tracked by the operating system and not the shell from which it was launched.
Job Spec
A jobspec
can be thought of as a job identifier or job number.
As mentioned previously, a job
is purely a shell-level construct. The shell tracks all suspended and background processes. The jobspec
is simply an identifier used by the shell to track the suspended or backgrounded process.
In the above example:
~/copyconstruct@bailey: jobs -l
[4]+ 38992 Suspended: 18 emacs .
[4]
is the jobspec
. The jobspec
is used by the job control builtins to operate on jobs. To refer to a jobspec
in the shell, it needs to be prefixed with a %
.
Enabling Job Control
Job control can be enabled using the set
builtin.
set -m
or set -o monitor
Job Control Builtins
fg
, bg
and jobs
are three job control commands that work purely on jobs.
However, the jobspec
can also be used with process control commands like kill
, wait
, disown
and suspend
.
bg
Used to resume a suspended job in the background.
Usage: bg [jobspec]
If no jobspec
is provided, the currently running job is used. Trying to use this with an invalid jobspec
results in an error.
~/copyconstruct@bailey bg %5-bash: bg: %5: no such job
fg
Used to resume a jobspec
in the foreground, making it the current job.
Usage: fg [jobspec]
jobs
The jobs
command lists all such jobs tracked by the current shell.
Usage:
jobs [-lnprs] jobspecjobs -x command [arguments]
Per the manual:
The first form lists the active jobs. The options have the following meanings:
-l
: List process IDs in addition to the normal information.
-n
: Display information only about jobs that have changed status since the user was last notified of their status.
-p
: List only the process ID of the job’s process group leader.
-r
: Display only running jobs.
-s
:Display only stopped jobs.If
jobspec
is given, output is restricted to information about that job. Ifjobspec
is not supplied, the status of all jobs is listed.If the
-x
option is supplied,jobs
replaces anyjobspec
found incommand
orarguments
with the corresponding process group ID, and executes command, passing it arguments, returning its exit status.
Were you to run the script above, you’d see the background jobs listed amidst the output of function bar
.
...
14
Background jobs: [1]+ Running bar &
15
16
...
disown
disown
works on both processes
and jobs
. When job control is enabled, the disown
command can be used to remove jobs from the job table of the shell.
Usage: disown [-ar] [-h] [jobspec … | pid … ]
~/copyconstruct@bailey disown %4-bash: warning: deleting stopped job 4 with process group 38992
The -h
option is used when we don’t want the job removed from the shell’s table but we wish to turn off SIGHUP
being sent to the job by the shell when the shell that launched it receives one.
The -a
option without a jobspec
will remove all the jobs from the table, whereas the -r
option will only remove currently running jobs.
suspend
Used to suspend the shell. The shell’s parent process can resume it with a SIGCONT
signal.
Usage: suspend [-f]
wait
wait
, like disown
, works on both processes and jobs. Job control mode needs to be enabled for wait
to work with jobs.
Usage: wait [-fn] [jobspec or pid]
wait
tells the shell to wait until the subprocess specified by the pid
or the jobspec
exits. The return code is that of the last command the shell waited for. When a jobspec
is provided, the shell will wait until all the processes in the job exit.
The above script, unlike its predecessor, waits for the function bar
to complete before it exits.
Invoking wait
without any arguments causes the shell to wait for all currently active child processes. The following are the arguments wait
accepts:
-n
: wait
waits for a single job to terminate and returns its exit status.
-f
: In the job control mode, wait
will return when the job changes state. The -f
option causes wait
to wait for each pid
or jobspec
to terminate before returning.
If neither jobspec
nor pid
specifies an active child process of the shell, the return status is 127.
copyconstruct@bailey: wait %3-bash: wait: %3: no such job
kill
kill
, like disown
, works on both processes and jobs. The job control mode needs to be enabled for kill
to work with jobs.
Usage:
kill [-s sigspec] [-n signum] [-sigspec] jobspec or pidkill -l|-L [exit_status]
The kill
builtin sends a signal to the process specified by thejobspec
or pid
. kill
works with the following options:
-s
: sigspec
is either a case-insensitive signal name such asSIGINT
(with or without the SIG
prefix) OR a signal number.
-n
: signum
is a signal number (kill -n 2 %1
will send a TERM
to the job with the jobspec
1)
If sigspec
and signum
are not present, SIGTERM
is used.
In the above example, we start two background jobs. We then proceed to kill one with an INT
(line 16), and another with a TERM
(line 18).
Setting set -b
causes the status of terminated background jobs to be reported immediately, rather than before printing the next primary prompt.
checkjobs
The shell prints a warning message when one tries to exit a shell that has suspended jobs, until a second exit is attempted at which point the shell actually exits without further ado. If the checkjobs
option is enabled, the shell lists each job and its status the first time one tries to exit the shell.
checkjobs
can be enabled with the shopt
builtin.
Conventions
So far we’ve only referrred to jobs with a %[n]
, where n
is the jobspec. There exist other ways to refer to jobs:
%%
— “current” job (last foreground job stopped or last background job started)%+
— “current” job (last foreground job stopped or last background job started)%
— current job%-
— previous job
A job
can also be referred to using a prefix of the name used to start it, or using a substring that appears in its command line. If the prefix or substring matches more than one job
, Bash reports an error.
%foo
— Invokes a job beginning with string foo%?foo
— Invokes a job contains within it string foo
For example, C-z can be used to suspend emacs. To bring back the suspended emacs process, %emacs
will do the trick.
Conclusion
In long running scripts, it’s useful to be able to start jobs in the background and be able to control when and how they terminate.
Whether a script of even this modest level of complexity should be written in Bash as opposed to a real programming language like Go or Python is a matter of opinion. However, it’s also odds on that such a script might be a few lines of Bash as opposed to tens of lines in Go or Python. Furthermore, if this happens to be a script that needs to run in an environment that’s not one’s laptop, shipping a Go binary or setting up a Python environment along with all the dependencies might be non-trivial. Bash is worth learning, not least since it’s more ubiquitous than any other language.