Sam Varshavchik
2003-01-13 23:45:49 UTC
It's been brought to my attention that OpenBSD 3.2 can assign the same pid
to different processes in the same chronological second (that is, one
process terminates, and its pid is immediately assigned to a new process
that's created soon thereafter).
Since filenames for messages in maildirs are generated based on the
combination of the pid, and the current time, in seconds, there is now a
race condition that will result in loss or corruption of mail. This breaks
Courier and Qmail. Example: process A delivers a message to a maildir just
before process B reads Maildir/new, and A's message is moved to Maildir/cur
a short time after it's been delivered, and A exits; meanwhile process C
starts, and delivers mail to the same maildir; its generated filename will
now be the same; so now you will have two messages with the same base
filename in the maildir).
This is not theoretical. This been brought to my attention after someone
managed to succesfully deliver two messages with the same filename, on a
piddly Athlon (as part of a Courier "stress test"). Let's have a round of
applause for such a noteworthy accomplishment!
Now, my opinion is whoever did this in OpenBSD was insane. Hey folks, let's
all 'rm -f /bin/ps', because ps(1)'s output is meaningless now. Whatever it
spits out, by the time you get around to typing 'kill', you may end up
signaling a completely different process. Oh, and setting 'sig' to 0 in
kill(2) no longer means anything either. I mean, it's just plain stupid
to use kill(0) to validate a pid, do a few miscellaneous things, then try
again with a real signal, right? So if you now end up bitch-slapping a
completely unrelated process, it's going to be all your fault.
But, what do I know... In any event, here's a patch that will apply to the
current release of Courier, Courier-IMAP, SqWebMail, and maildrop.
http://www.courier-mta.org/beta/patches/pid-fix/ - it adds microseconds
to the generated maildir filenames. This'll tide things over, until OpenBSD
begins recycling pids in the same microsecond.
to different processes in the same chronological second (that is, one
process terminates, and its pid is immediately assigned to a new process
that's created soon thereafter).
Since filenames for messages in maildirs are generated based on the
combination of the pid, and the current time, in seconds, there is now a
race condition that will result in loss or corruption of mail. This breaks
Courier and Qmail. Example: process A delivers a message to a maildir just
before process B reads Maildir/new, and A's message is moved to Maildir/cur
a short time after it's been delivered, and A exits; meanwhile process C
starts, and delivers mail to the same maildir; its generated filename will
now be the same; so now you will have two messages with the same base
filename in the maildir).
This is not theoretical. This been brought to my attention after someone
managed to succesfully deliver two messages with the same filename, on a
piddly Athlon (as part of a Courier "stress test"). Let's have a round of
applause for such a noteworthy accomplishment!
Now, my opinion is whoever did this in OpenBSD was insane. Hey folks, let's
all 'rm -f /bin/ps', because ps(1)'s output is meaningless now. Whatever it
spits out, by the time you get around to typing 'kill', you may end up
signaling a completely different process. Oh, and setting 'sig' to 0 in
kill(2) no longer means anything either. I mean, it's just plain stupid
to use kill(0) to validate a pid, do a few miscellaneous things, then try
again with a real signal, right? So if you now end up bitch-slapping a
completely unrelated process, it's going to be all your fault.
But, what do I know... In any event, here's a patch that will apply to the
current release of Courier, Courier-IMAP, SqWebMail, and maildrop.
http://www.courier-mta.org/beta/patches/pid-fix/ - it adds microseconds
to the generated maildir filenames. This'll tide things over, until OpenBSD
begins recycling pids in the same microsecond.