Procmail block foregin character sets

Head Surfer
October 10th, 2008

Place the following in your procmailrc file if you want to block foreign character sets:

CHARSET_JP=”WINDOWS-932|EUC-JP|(cs-?)?ISO-?2022-?JP(-[12])?|ISO-2022-D|SHIFT[-_]JIS|JIS[-_]?X[-_]?02(08|01|12|13)|sjis|jis7|ms-kanji|(x-)?mac(-)?japanese|x-EBCDIC-Japanese(Katakana|AndUSCanada|AndJapaneseLatin|AndKana)”
CHARSET_CN=”WINDOWS-(936|950)|EUC-CN|(hz-|x-euc-tw)?GB[-_]?2312|(cn-)?(BIG5|gb)|ISO-2022-([EGHIJKLM]|cn|cn-ext)|ISO-IR-165|GB8565.2(-1988)?|x-euc-tw|hz|iso-ir-58|gbk|big5-hkscs|gb18030|(x-)?mac(-)?chinese(trad|imp)|iso-ir-58|x-EBCDIC-(Traditional|Simplified)Chinese|x-Chinese-(CNS|eten)”
# non-standards compliant variations of chinese
CHARSET_CN_BOGUS=”CHINESEBIG5|BIG-5″
CHARSET_KR=”WINDOWS-949|EUC-KR|KS[-_ ]?C[-_ ]?5601([-_ ]?1987)?|ISO-2022-(C|kr)|KS[-_]?X[-_]?1001|ksc5636|iso-646-kr|uhc|johab|(x-)?mac(-)?korean|iso-ir-149|x-EBCDIC-(KoreanAnd)?KoreanExtended”
# some mailer actually sets this
CHARSET_BOGUS=”X-UNKNOWN|USER-DEFINED”
# Not recommended to block these – they’re all rather encompassing
CHARSET_UNICODE=”UTF(-)?(7|8|16)]|UCS(-)?(2|4)|UNICODE-1-1-UTF-7|ISO-10646-UCS-2|UNICODE-(16|32)(LITTLE|BIG)-ENDIAN)?|unicodeFFFE|JAVA|x-EBCDIC-International(-euro)?”
# If you’re english, you probably don’t want to block this one either.
CHARSET_ENG=”US-ASCII|ASCII|iso-ir-6|iso646-us|x-EBCDIC-(cp-us|UK)(-euro)?”
# Western European (English, but also French and many others.  Standard)
CHARSET_WESTEURO=”WINDOWS-1252|ISO-?8859-(1|15)|iso-ir-100|(x-)?mac(-)?roman|latin-?(1|9)|macintosh|x-IA5(-German)?|x-ebcdic-(spain|italy|germany|france)(-euro)?|x-europa”
# Central/Eastern European (non-english)
CHARSET_SLAVIC=”WINDOWS-1250|ISO-?8859-(2|16)|iso-ir-(87|102)|(x-)?mac(-)?(central-europe|ce|croatian)|latin-?2|CP870″
# uncommon stuff and/or generally obsoleted.  Includes maltese (eh, sorry if that’s you)
CHARSET_FUNKYLATIN=”ISO-?8859-[34]|iso-ir-109|latin-?3″
# Russian, et-al.
# KOI8-T is Tajiki (Tajikistan)
# armscii-8 is Armenian
CHARSET_CYRILLIC=”WINDOWS-1251|ISO-?8859-5|KOI8(-(RU|[RTU]))?|ISO-IR-(101|111|144|147)|IBM866|(x-)?mac(-)?(romanian?|cyrillic|ukran(e|ian))|nunacom-8|armscii-8|x-EBCDIC-Cyrillic(SerbianBulgarian|Russian)”
# Arabic
CHARSET_ARABIC=”WINDOWS-1256|ISO-?8859-6|iso-ir-127|(x-)?mac(-)?arabic|asmo-708|x-EBCDIC-Arabic”
# Greek
CHARSET_GREEK=”WINDOWS-1253|ISO-?8859-7|(x-)?mac(-)?greek|iso-ir-(126|150)|x-EBCDIC-Greek(Modern)?”
# Hebrew
CHARSET_HEBREW=”WINDOWS-1255|ISO-?8859-8(-i)?|(x-)?mac(-)?hebrew|iso-ir-138|x-EBCDIC-Hebrew”
# Turkish
CHARSET_TURKISH=”WINDOWS-1254|ISO-?8859-9|(x-)?mac(-)?turkish|iso-ir-(109|148)|latin-?5|x-EBCDIC-Turkish|CP1026″
# Icelandic/Nordic (i.e. Iceland, Greenland, Norway, Sweden…)
CHARSET_NORDIC=”ISO-?8859-10|(x-)?mac(-)?iceland(ic)?|iso-ir-60|x-IA5-(Norwegian|Swedish)|x-EBCDIC-(FinlandSweden|DenmarkNorway|Icelandic)(-euro)?”
# Thai (ISO not _actually_ used, but draft standard is same)
CHARSET_THAI=”WINDOWS-874|TIS[-_]?620|ISO-?8859-11|mulelao-1|ibm-cp1133|(x-)?mac(-)?thai|x-EBCDIC-Thai”
# ISO-8859-12 is bogus (was suggested to be vietnamese, but can’t fit).
# However, I’ve seen this encoding specified in spam though, and lacking an
# official designation, I’m hocking it here.
CHARSET_VIETNAM=”WINDOWS-1258|ISO-?8859-12|viscii|tcvn5712|vps”
# Baltic Rim
CHARSET_BALTIC=”WINDOWS-1257|ISO-?8859-13|iso-ir-110″
# Celtic (Irish and Welsh)
CHARSET_CELTIC=”ISO-?8859-14″
# Other stuff which escapes categorization at this time
CHARSET_MISC=”isiri-3342|x-iscii-(as|be|de|gu|ka|ma|or|pa|ta|te)”

CHARSETS=”${CHARSET_CN}|${CHARSET_CN_BOGUS}|${CHARSET_KR}|${CHARSET_JP}|${CHARSET_BOGUS}|${CHARSET_SLAVIC}|${CHARSET_FUNKYLATIN}|${CHARSET_CYRILLIC}|${CHARSET_ARABIC}|${CHARSET_GREEK}|${CHARSET_HEBREW}|${CHARSET_TURKISH}|${CHARSET_THAI}|${CHARSET_VIETNAM}|${CHARSET_BALTIC}|${CHARSET_MISC}”

# Messages identifying the character set in the From: or Subject:
:0
* $ ^(From|Subject):${wsstar}=?/(${CHARSETS})?[QB]
{
# This scrubs the delimiters from the MATCH string,
# leaving us with just the text of the matched charset descriptor.
:0
* MATCH ?? ()/[^?]+
{
:0
/dev/null
}
}

How To Host A Website is a Web Hosting Review service. We provide web hosting reviews for best hosting, server hosting, cheap hosting and more as we answer the most important website hosting question - How To Host Website pages successfully.

Words visitors used that found this page online:


Disclosure: We are a professional review site that receives compensation from the companies whose products we review. We are independently owned and the opinions expressed here are our own. Please contact us if you would like your web hosting company considered for a web hosting and support review.
Back to top Copyright 2012 © How To Host A Website, An OS Hosting, LLC Company