如何在bash脚本中等待从该脚本派生的多个子进程完成并返回退出代码!= 0,当任何子进程以代码!= 0结尾时?


for i in `seq 0 9`; dodoCalculations $i &

上面的脚本将等待所有10个产生的子进程,但始终将退出状态设置为0(请参阅help wait )。 如何修改此脚本,以便在子进程中的任何一个以代码!= 0结尾时,它会发现生成的子流程的退出状态并返回退出代码1?



到目前为止,这是我想出的。 我想看看如果一个孩子终止了如何中断睡眠命令,这样就不必调整WAITALL_DELAY到一个人的用法。

waitall() { # PID...## Wait for children to exit and indicate whether all exited with 0 status.local errors=0while :; dodebug "Processes remaining: $*"for pid in "$@"; doshiftif kill -0 "$pid" 2>/dev/null; thendebug "$pid is still alive."set -- "$@" "$pid"elif wait "$pid"; thendebug "$pid exited with zero exit status."elsedebug "$pid exited with non-zero exit status."((++errors))fidone(("$#" > 0)) || break# TODO: how to interrupt this sleep when a child terminates?sleep ${WAITALL_DELAY:-1}done((errors == 0))
}debug() { echo "DEBUG: $*" >&2; }pids=""
for t in 3 5 4; do sleep "$t" &pids="$pids $!"
waitall $pids



for i in $(whatever_list) ; dodo_something $i


for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...(export -f do_something ## export functions (if needed)export PATH ## export any variables that are requiredxargs -I{} --max-procs 0 bash -c ' ## process in batches...{echo "processing {}" ## optionaldo_something {}}' )
  • 如果一个进程中发生错误 ,则不会中断其他进程,但会导致整个序列的退出代码为非零
  • 在任何特定情况下,导出函数和变量可能是必需的,也可能不是。
  • 您可以根据所需的并行度设置--max-procs0表示“一次全部”)。
  • 当代替xargs使用GNU Parallel时,它提供了一些附加功能-但默认情况下并不总是安装它。
  • 在此示例中, for循环不是严格必需的,因为echo $i基本上只是重新生成$(whatever_list )的输出。 我只是认为for关键字的使用使查看情况变得容易一些。
  • Bash字符串处理可能令人困惑-我发现使用单引号最适合包装非平凡的脚本。
  • 您可以轻松地中断整个操作(使用^ C或类似方法), 这与更直接的Bash并行性方法不同 。


for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '{echo sleep {}sleep 2s}'


如果您有bash 4.2或更高版本,以下内容可能对您有用。 它使用关联数组存储任务名称及其“代码”以及任务名称及其pid。 我还构建了一个简单的速率限制方法,如果您的任务占用大量CPU或I / O时间并且您想限制并发任务的数量,该方法可能会派上用场。


在简单的情况下,这有点过头了,但它可以提供相当整洁的东西。 例如,可以将每个任务的错误消息存储在另一个关联数组中,并在一切解决后将它们打印出来。

#! /bin/bashmain () {local -A pids=()local -A tasks=([task1]="echo 1"[task2]="echo 2"[task3]="echo 3"[task4]="false"[task5]="echo 5"[task6]="false")local max_concurrent_tasks=2for key in "${!tasks[@]}"; dowhile [ $(jobs 2>&1 | grep -c Running) -ge "$max_concurrent_tasks" ]; dosleep 1 # gnu sleep allows floating point here...done${tasks[$key]} &pids+=(["$key"]="$!")doneerrors=0for key in "${!tasks[@]}"; dopid=${pids[$key]}local cur_ret=0if [ -z "$pid" ]; thenecho "No Job ID known for the $key process" # should never happencur_ret=1elsewait $pidcur_ret=$?fiif [ "$cur_ret" -ne 0 ]; thenerrors=$(($errors + 1))echo "$key (${tasks[$key]}) failed."fidonereturn $errors



# activate child monitoring
set -o monitor# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!# count, and kill when all done
function kill_on_count() {# you could kill on whatever criterion you wish for# I just counted to simulate bash's wait with no args[ $c -eq 9 ] && kill $pidc=$((c+1))echo -n '.' # async feedback (but you don't know which one)
trap "kill_on_count" CHLDfunction save_status() {local i=$1;local rc=$2;# do whatever, and here you know which one stopped# but remember, you're called from a subshell# so vars have their values at fork time
}# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do(doCalculations $i; save_status $i $?) &
done# wait for locking subprocess to be killed
wait $pid

从那里可以轻松推断出并触发(触摸文件,发送信号)并更改计数标准(触摸文件的计数或其他方式)以响应该触发。 或者,如果您只是想要“任何”非零的rc,只需杀死save_status的锁。


我需要这个,但是目标进程不是当前shell的子进程,在这种情况下, wait $PID不起作用。 我确实找到了以下替代方法:

while [ -e /proc/$PID ]; do sleep 0.1 ; done

这取决于procfs的存在,而procfs可能不可用(例如Mac不提供)。 因此,为了可移植性,您可以改用以下方法:

while ps -p $PID >/dev/null ; do sleep 0.1 ; done


set -m
for i in `seq 0 9`; dodoCalculations $i &
while fg; do true; done
  • set -m允许您在脚本中使用fg&bg
  • fg除了将最后一个进程置于前台之外,其退出状态与其前台进程相同
  • while fg当任何退出状态为非零的fg退出时while fg将停止循环

不幸的是,当后台进程退出且退出状态为非零时,将无法处理这种情况。 (循环不会立即终止。它将等待之前的过程完成。)


我已经尝试过了,并结合了其他示例中的所有最佳部分。 当任何后台进程退出时,此脚本将执行checkpids函数,并输出退出状态而无需进行轮询。

#!/bin/bashset -o monitorsleep 2 &
sleep 4 && exit 1 &
sleep 6 &pids=`jobs -p`checkpids() {for pid in $pids; doif kill -0 $pid 2>/dev/null; thenecho $pid is still alive.elif wait $pid; thenecho $pid exited with zero exit status.elseecho $pid exited with non-zero exit status.fidoneecho
}trap checkpids CHLDwait



#!/bin/bashpids=""for i in `seq 0 9`; dodoCalculations $i &pids="$pids $!"
donewait $pids...code continued here ...


正如多位评论者所指出的那样,以上内容在继续之前等待所有过程完成,但是如果其中一个失败则不会退出并失败,这可以与@ Bryan,@ SamBrightman等建议的以下修改一起进行:

RESULT=0for i in `seq 0 9`; dodoCalculations $i &pids="$pids $!"
donefor pid in $pids; dowait $pid || let "RESULT=1"
doneif [ "$RESULT" == "1" ];thenexit 1
fi...code continued here ...



我做了一些实验(在Solaris上同时使用bash和ksh),发现'wait'输出的退出状态为非零,或者当没有提供PID参数时返回非零退出的作业列表。 例如


$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]-  Exit 2                  sleep 20 && exit 2
[2]+  Exit 1                  sleep 10 && exit 1


$ sleep 20 && exit 1 &
$ sleep 10 && exit 2 &
$ wait
[1]+  Done(2)                  sleep 20 && exit 2
[2]+  Done(1)                  sleep 10 && exit 1


#!/bin/bashtrap "rm -f /tmp/x.$$" EXITfor i in `seq 0 9`; dodoCalculations $i &
donewait 2> /tmp/x.$$
if [ `wc -l /tmp/x.$$` -gt 0 ] ; thenexit 1


wait 2> >(wc -l)

也会返回一个计数,但没有tmp文件。 也可以通过这种方式使用它,例如:

wait 2> >(if [ `wc -l` -gt 0 ] ; then echo "ERROR"; fi)

但这并没有比tmp文件IMO有用得多。 我找不到避免tmp文件的有用方法,同时也避免在子外壳程序中运行“等待”,这根本行不通。



#!/bin/bashtrap 'rm -f $tmpfile' EXITtmpfile=$(mktemp)doCalculations() {echo start job $i...sleep $((RANDOM % 5)) echo ...end job $iexit $((RANDOM % 10))
}number_of_jobs=10for i in $( seq 1 $number_of_jobs )
do( trap "echo job$i : exit value : \$? >> $tmpfile" EXIT; doCalculations ) &
donewait i=0
while read res; doecho "$res"let i++
done < "$tmpfile"echo $i jobs done !!!


陷阱是你的朋友。 您可以在许多系统中捕获ERR。 您可以捕获EXIT或在DEBUG上捕获每个命令后执行一段代码。




#! /bin/bashitems="1 2 3 4 5 6"
pids=""for item in $items; dosleep $item &pids+="$! "
donefor pid in $pids; dowait $pidif [ $? -eq 0 ]; thenecho "SUCCESS - Job $pid exited with a status of $?"elseecho "FAILED - Job $pid exited with a status of $?"fi

我使用与并行启动/停止服务器/服务非常相似的方法,并检查每个退出状态。 对我来说很棒。 希望这可以帮助某人!


wait也是(可选)主罚等待进程的PID,并用$! 您将获得在后台启动的最后一个命令的PID。 修改循环以将每个生成的子流程的PID存储到数组中,然后再次循环以等待每个PID。

# run processes and store pids in array
for i in $n_procs; do./procs[${i}] &pids[${i}]=$!
done# wait for all pids
for pid in ${pids[*]}; dowait $pid




set -o monitor        # enable script job control
trap 'echo "child died"' CHLD


在较低级别的POSIX API中,获取该子状态通常是wait功能族的工作。 不幸的是,Bash对此的支持是有限的-您可以等待一个特定的子进程(并获取其退出状态),也可以等待所有这些子进程,并始终获得0结果。

似乎无法做的事情就是waitpid(-1) ,它一直阻塞直到任何子进程返回。


我在想也许运行doCalculations; echo "$?" >>/tmp/acc doCalculations; echo "$?" >>/tmp/acc doCalculations; echo "$?" >>/tmp/acc在发送到后台的子外壳中,然后是wait ,然后是/tmp/acc将包含退出状态,每行一个。 但是,我不知道附加到累加器文件中的多个进程的任何后果。



#!/bin/shrandom -e 20
sleep $?
random -e 10


#!/bin/shrm /tmp/accfor i in $( seq 0 20 )
do( ./doCalculations "$i"; echo "$?" >>/tmp/acc ) &
donewaitcat /tmp/acc | fmt
rm /tmp/acc


5 1 9 6 8 1 2 0 9 6 5 9 6 0 0 4 9 5 5 9 8




$ sleep 10 &
$ sleep 10 &
$ sleep 20 &
$ sleep 20 &


$ wait < <(jobs -p)

或者只是wait (不带参数)。



请参阅: help waithelp jobs以获取语法。



$ sleep 20 && true || tee fail &
$ sleep 20 && false || tee fail &
$ wait < <(jobs -p)
$ test -f fail && echo Calculation failed.


set -e
fail () {touch .failure
expect () {waitif [ -f .failure ]; thenrm -f .failureexit 1fi
}sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail

顶部的set -e使您的脚本在失败时停止。

如果任何子作业失败, expect将返回1



function WaitForTaskCompletion {local pids="${1}" # pids to wait for, separated by semi-colonlocal soft_max_time="${2}" # If execution takes longer than $soft_max_time seconds, will log a warning, unless $soft_max_time equals 0.local hard_max_time="${3}" # If execution takes longer than $hard_max_time seconds, will stop execution, unless $hard_max_time equals 0.local caller_name="${4}" # Who called this functionlocal exit_on_error="${5:-false}" # Should the function exit program on subprocess errors       Logger "${FUNCNAME[0]} called by [$caller_name]."local soft_alert=0 # Does a soft alert need to be triggered, if yes, send an alert once local log_ttime=0 # local time instance for comparaisonlocal seconds_begin=$SECONDS # Seconds since the beginning of the scriptlocal exec_time=0 # Seconds since the beginning of this functionlocal retval=0 # return value of monitored pid processlocal errorcount=0 # Number of pids that finished with errorslocal pidCount # number of given pidsIFS=';' read -a pidsArray <<< "$pids"pidCount=${#pidsArray[@]}while [ ${#pidsArray[@]} -gt 0 ]; donewPidsArray=()for pid in "${pidsArray[@]}"; doif kill -0 $pid > /dev/null 2>&1; thennewPidsArray+=($pid)elsewait $pidresult=$?if [ $result -ne 0 ]; thenerrorcount=$((errorcount+1))Logger "${FUNCNAME[0]} called by [$caller_name] finished monitoring [$pid] with exitcode [$result]."fifidone## Log a standby message every hourexec_time=$(($SECONDS - $seconds_begin))if [ $((($exec_time + 1) % 3600)) -eq 0 ]; thenif [ $log_ttime -ne $exec_time ]; thenlog_ttime=$exec_timeLogger "Current tasks still running with pids [${pidsArray[@]}]."fifiif [ $exec_time -gt $soft_max_time ]; thenif [ $soft_alert -eq 0 ] && [ $soft_max_time -ne 0 ]; thenLogger "Max soft execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]."soft_alert=1SendAlertfiif [ $exec_time -gt $hard_max_time ] && [ $hard_max_time -ne 0 ]; thenLogger "Max hard execution time exceeded for task [$caller_name] with pids [${pidsArray[@]}]. Stopping task execution."kill -SIGTERM $pidif [ $? == 0 ]; thenLogger "Task stopped successfully"elseerrrorcount=$((errorcount+1))fififipidsArray=("${newPidsArray[@]}")sleep 1doneLogger "${FUNCNAME[0]} ended for [$caller_name] using [$pidCount] subprocesses with [$errorcount] errors."if [ $exit_on_error == true ] && [ $errorcount -gt 0 ]; thenLogger "Stopping execution."exit 1337elsereturn $errorcountfi
}# Just a plain stupid logging function to replace with yours
function Logger {local value="${1}"echo $value

例如,等待所有三个进程完成,如果执行花费的时间超过5秒,则记录警告,如果执行花费的时间超过120秒,则停止所有进程。 不要在失败时退出程序。

function something {sleep 10 &pids="$!"sleep 12 &pids="$pids;$!"sleep 9 &pids="$pids;$!"WaitForTaskCompletion $pids 5 120 ${FUNCNAME[0]} false
# Launch the function



n=10 # run 10 jobs
PIDS=()while truemy_function_or_command &PID=$!echo "Launched job as PID=$PID"PIDS+=($PID)(( c+=1 ))# required to prevent any exit due to error# caused by additional commands run which you# may add when modifying this exampletruedoif (( c < n ))thencontinueelsebreakfi
done # collect launched jobsfor pid in "${PIDS[@]}"
dowait $pid || echo "failed job PID=$pid"



#!/usr/bin/env bashset -m # allow for job control
EXIT_CODE=0;  # exit code of overall scriptfunction foo() {echo "CHLD exit code is $1"echo "CHLD pid is $2"echo $(jobs -l)for job in `jobs -p`; doecho "PID => ${job}"wait ${job} ||  echo "At least one test failed with exit code => $?" ; EXIT_CODE=1done
}trap 'foo $? $$' CHLDDIRN=$(dirname "$0");commands=("{ echo "foo" && exit 4; }""{ echo "bar" && exit 3; }""{ echo "baz" && exit 5; }"
)clen=`expr "${#commands[@]}" - 1` # get length of commands - 1for i in `seq 0 "$clen"`; do(echo "${commands[$i]}" | bash) &   # run the command via bash in subshellecho "$i ith command has been issued as a background job"
done# wait for all to finish
wait;echo "EXIT_CODE => $EXIT_CODE"
exit "$EXIT_CODE"# end




http://jeremy.zawodny.com/blog/archives/010717.html :

#!/bin/bashFAIL=0echo "starting"./sleeper 2 0 &
./sleeper 2 1 &
./sleeper 3 0 &
./sleeper 2 0 &for job in `jobs -p`
echo $jobwait $job || let "FAIL+=1"
doneecho $FAILif [ "$FAIL" == "0" ];
echo "YAY!"
echo "FAIL! ($FAIL)"


在某些情况下,该过程可能会在等待该过程之前完成。 如果我们触发等待已经完成的进程,它将触发错误,例如pid不是此外壳的子级。 为了避免这种情况,可以使用以下函数查找该过程是否完成:

while [ -e /proc/$PID ]
doecho "Process: $PID is still running"sleep 5
echo "Process $PID has finished"


我认为并行运行作业并检查状态的最直接方法是使用临时文件。 已经有几个类似的答案(例如Nietzche-jou和mug896)。

rm -f fail
for i in `seq 0 9`; dodoCalculations $i || touch fail &
! [ -f fail ]

上面的代码不是线程安全的。 如果您担心上面的代码将与其本身同时运行,则最好使用更唯一的文件名,例如fail。$$。 最后一行是满足要求的:“当任何子进程以代码!= 0结尾时,返回退出代码1”。 我在那里提出了一个额外的要求进行清理。 这样写可能更清楚了:

trap 'rm -f fail.$$' EXIT
for i in `seq 0 9`; dodoCalculations $i || touch fail.$$ &
! [ -f fail.$$ ]

这是用于收集多个作业的结果的类似代码段:创建一个临时目录,将所有子任务的输出记录在一个单独的文件中,然后将其转储以进行检查。 这与问题不完全匹配-我将其作为奖励:

trap 'rm -fr $WORK' EXITWORK=/tmp/$$.work
mkdir -p $WORK
cd $WORKfor i in `seq 0 9`; dodoCalculations $i >$i.result &
grep $ *  # display the results with filenames and contents


等待多个子流程并在其中任何一个以非零状态代码退出时退出的解决方案是使用“ wait -n”

{for (( i = 1; i <= $#; i++ )) dowait -n $@status=$?echo "received status: "$statusif [ $status -ne 0 ] && [ $status -ne 127 ]; thenexit 1fidone
{sleep 10exit 10
{sleep 20
}sleep_for_10 &
pid1=$!sleep_for_20 &
pid2=$!wait_for_pids $pid2 $pid1

状态代码“ 127”用于不存在的进程,这意味着孩子可能已经退出。



#wait for jobs
for job in `jobs -p`; do wait ${job}; done



注意 :: :for不仅保留并返回失败函数的退出代码,而且终止所有并行运行的实例。 在这种情况下可能不需要。

#!/usr/bin/env bash# Wait for pids to terminate. If one pid exits with
# a non zero exit code, send the TERM signal to all
# processes and retain that exit code
# usage:
# :wait 123 32
function :wait(){local pids=("$@")[ ${#pids} -eq 0 ] && return $?trap 'kill -INT "${pids[@]}" &>/dev/null || true; trap - INT' INTtrap 'kill -TERM "${pids[@]}" &>/dev/null || true; trap - RETURN TERM' RETURN TERMfor pid in "${pids[@]}"; dowait "${pid}" || return $?donetrap - INT RETURN TERM
}# Run a function in parallel for each argument.
# Stop all instances if one exits with a non zero
# exit code
# usage:
# :for func 1 2 3
# env:
# FOR_PARALLEL: Max functions running in parallel
function :for(){local f="${1}" && shiftlocal i=0local pids=()for arg in "$@"; do( ${f} "${arg}" ) &pids+=("$!")if [ ! -z ${FOR_PARALLEL+x} ]; then(( i=(i+1)%${FOR_PARALLEL} ))if (( i==0 )) ;then:wait "${pids[@]}" || return $?pids=()fifidone && [ ${#pids} -eq 0 ] || :wait "${pids[@]}" || return $?



#!/usr/bin/env bash
set -e# import :for from gist: https://gist.github.com/Enteee/c8c11d46a95568be4d331ba58a702b62#file-for
# if you don't like curl imports, source the actual file here.
source <(curl -Ls https://gist.githubusercontent.com/Enteee/c8c11d46a95568be4d331ba58a702b62/raw/)msg="You should see this three times":(){i="${1}" && shiftecho "${msg}"sleep 1if   [ "$i" == "1" ]; then sleep 1elif [ "$i" == "2" ]; then falseelif [ "$i" == "3" ]; thensleep 3echo "You should never see this"fi
} && :for : 1 2 3 || exit $?echo "You should never see this"
$ ./for.sh; echo $?
You should see this three times
You should see this three times
You should see this three times


  • [1]: 博客
  • [2]: 要点



for i in $(seq 0 9); do(doCalculations $i >&2 & wait %1; echo $?) &
done | grep -qv 0 && exit 1


如果安装了GNU Parallel,则可以执行以下操作:

# If doCalculations is a function
export -f doCalculations
seq 0 9 | parallel doCalculations {}

GNU Parallel将为您提供退出代码:

  • 0-所有作业均正常运行。

  • 1-253-一些作业失败。 退出状态给出失败作业的数量

  • 254-超过253个作业失败。

  • 255-其他错误。

观看介绍性视频以了解更多信息: http : //pi.dk/1



tmp=/tmp/results: > $tmp  #clean the filefor i in `seq 0 9`; do(doCalculations $i; echo $i:$?>>$tmp)&
done      #iteratewait      #wait until all readysort $tmp | grep -v ':0'  #... handle as required

