I was just tracking down why some mails seem to have garbled Subjects.
It looked like this in Alpine:
Subject: [BOFH commits] r2040: fix directory struct =?UTF-8?Q?ure=E2=86=B5=20unix?=/mirror/=?UTF-8?Q?=20=E2=86=92=20mirror?=. bonn
The raw header sight was this:
Subject: [BOFH commits] =?utf-8?q?r2040=3A__fix_directory_struct__=3D=3FUT?=
=?utf-8?b?Ri04P1E/dXJlPUUyPTg2PUI1PTIwdW5peD89L21pcnJvci89P1VURi04P1E/?=
=?utf-8?b?PTIwPUUyPTg2PTkyPTIwbWlycm9yPz0uIGJvbm4=?=
Ein Schelm, wer Böses dabei denkt… First suspect was, of course, Mailman – after decoding, this showed the classical signs of a double-encode. (After ruling out general header brokenness, but no, 76 chars is ok.) I thus hand-crafted an eMail with the correct header line and sent that out:
Subject: [BOFH commits] =?utf-8?q?r2040=3A_fix_directory_structure?=
=?utf-8?b?4oa1IHVuaXgvbWlycm9yLyDihpIgbWlycm9yLmJvbm4=?=
Huh? Mailman and Python weren’t the culprit, thus – this is correct mangling. Okay, let’s dive into the Perl code that actually sends out the eMails. To make a long story short, have a look at this, then RFC 2047:
tglase@tglase:~ $ perl -MEncode -e '$subject = "r2040: fix directory structure↵ unix/mirror/ → mirror.bonn"; Encode::from_to($subject, "UTF-8", "MIME-Q"); print "{Subject: ".$subject."}\n";'
{Subject: r2040:=?UTF-8?Q?=20fix=20directory=20struct?=
=?UTF-8?Q?ure=E2=86=B5=20unix?=/mirror/=?UTF-8?Q?=20=E2=86=92=20mirror?=.
bonn}
Amazingly enough, PHP’s mb_encode_mimeheader, despite being talked to trash in the comments on its online documentation, does manage to get it right:
tglase@tglase:~ $ php -r 'mb_internal_encoding("UTF-8");echo "{".mb_encode_mimeheader("Subject: r2040: fix directory structure↵ unix/mirror/ → mirror.bonn", "UTF-8", "Q")."}\n";'
{Subject: r2040: fix directory =?UTF-8?Q?structure=E2=86=B5=20unix/mirror/?=
=?UTF-8?Q?=20=E2=86=92=20mirror=2Ebonn?=}
Wow. Now, the Perl guys I know told me to use Perl’s Mail tools… which are much too high-level though, for all I have and want is the subject string and an RFC 822 header line. I told them I’m not above doing this, and so I did. The 3P languages can really be annoying.
Why’s Perl’s output wrong anyway? I don’t know for sure, but I think the atoms must be separated, so unquoting /mirror/
in the middle, with no spaces around it, are wrong. (Besides, Encode::from_to can’t do the job right anyway, as it misses the name of the header, which is included in the 76 chars allowed for the first line. BAD • Broken As Desdigned.)
Disclaimer: I don’t really know any Perl, I fight my way through PHP and, barely, Python. (But I can code.)