2004.09_Spam Proof-Techniques for Making Your Email Address Invisible to Harvesters.pdf

(3340 KB) Pobierz
Layout 1
COVER STORY
Spam-proof Homepages
Private Address
Purveyors of unsolicited mail harvest
most of their addresses from Internet
homepages. This article tells you how
to publish your own address and at
the same time keep it hidden from
automatic harvesting programs.
BY TOBIAS EGGENDORFER
sance and really hard to fight.
Although spam filters use more or
less successful heuristics to separate the
chaff from the grain, in contrast to the
computer virus scenario spammers are
often one step ahead of defensive tech-
niques, and continually develop new
methods to bypass all the rules that are
used by the filters.
using his IP address or a password – few
providers have already implemented this
method.
Spammers mainly harvest target
addresses from publicly accessible web-
sites [1] according to a study by the
"Center for Democracy and Technology"
(CDT). During the process of its investi-
gations, the CDT deliberately published
email addresses specifically created for
the purpose of the experiment on home-
pages, in newsgroups and with various
Web services. 97.3% of the 8842 mes-
sages received by these addresses, and
classified as unsolicited email advertis-
ing, were addressed to the mail
addresses published on Web pages (see
Figure 1).
Based on the results of the investiga-
tion, it would certainly seem to make
sense to avoid using mail addresses that
can be exploited by spammers when
designing Web pages. In fact, the authors
of the report state that it is worthwhile
obfuscating mail addresses on existing
Web pages as there was a drop in spam
messages after removing the offending
addresses (see Figure 2).
The Root of the Problem
Strangely enough, over-eager adminis-
trators contribute to the downfall of their
own defenses by sending discarded mes-
sages back to spammers, detailing why
the filter treated each message as spam.
Even without help spammers are contin-
ually on the lookout for new ways of
bypassing filters. We should thus avoid
thinking of a filter as an all-encompass-
ing solution.
Another approach is to squash spam at
its source, as evidenced by an increasing
lobby in favor of using authenticated
SMTP as a standard. A user wanting to
transmit an email message would then
need to authenticate to the mail server
Listing 1: Storing in the HTML head
01 <HTML>
02 <HEAD>
03 <TITLE>Sample Page</TITLE>
04 <SCRIPT LANGUAGE="JavaScript">
05 <!--
06 mailaddress =
'user@example.com';
07 //-->
08 </SCRIPT>
09 </HEAD>
10 <BODY>
11 [...]
12 <SCRIPT LANGUAGE=”JavaScript”>
13 <!--
14 document.write('<A
HREF="mailto:'+mailaddress+'">
'+mailaddress+'</A>');
15 //-->
16 </SCRIPT>
17 [...]
18 </BODY>
19 </HTML>
26
September 2004
www.linux-magazine.com
Design spam-proof homepages
S pam is becoming more of a nui-
564403183.006.png 564403183.007.png
Spam-proof Homepages
COVER STORY
Address Harvesting
The approach that most spammers adopt
is quite primitive. Starting on an arbi-
trary website, they simply store any
mailto: links (that is references to mail
addresses) they find. They then go on to
follow any other links on the pages, and
repeat the procedure until that tree of
page links is exhausted.
By continuing this process, spammers
will eventually arrive at a page referred
to only by a single link. In doing so, they
use techniques typical of search engines
to scan the whole of the Web. It is not
difficult to write a program that auto-
mates this task – a so-called “spider” or
“harvester”. After removing duplicate
addresses, the spammer is left with a col-
lection of potential victims to use in
spamming.
It is possible to design a simple email
address harvester using standard Linux
tools: wget , sed , tr , sort , and uniq . The
results that you can achieve are quite
amazing.
wget walks across websites, sed
searches the pages for email addresses.
tr provides uniform capitalization, sort
sorts the mail addresses alphabetically,
and uniq eliminates any duplicates.
When I tested this approach on my own
homepage, I harvested over 90 different
email addresses in only in only eight
minutes, none of them pointing to me.
Choosing a starting page with more links
and ignoring the convention of parsing
the robots.txt [2] file, will return far
more results in the same time.
Not supplying an email address at all
is typically not the kind of solution that
website owners prefer. After all, the idea
of a website is to provide an additional
communication channel. In some coun-
tries, website owners are required by law
to provide an address (this is the case in
Germany, for example).
Obfuscation Techniques
There are any number of popular tech-
niques for obfuscating mail addresses,
such as the remove_to_mail_me
approach, which would use someone@
remove_to_mail_me.example.com instead
of someone@example.com , for example.
Not all users remember to remove the
middle section before clicking on Reply ,
and some times it is not sure what is
meant to be removed. Again, legal
restrictions may prevent some commer-
cial website owners from using
obfuscated addresses.
There is also an increasing tendency
for spammers to spoof sender addresses.
Error messages do not reach the spam-
mer, but an unsuspecting third party,
and this in turn can lead to mail servers
crashing due to excessive loads.
The CDT [1] report mentioned previ-
ously suggests encoding mail addresses
on homepages as HTML entities , where
user@example.com would become:
Figure 1: Spammers mainly harvest target
addresses off the Web.
JavaScript to the Rescue?
Most spider programs cannot handle
JavaScript . This allows website owners
to use Java to obfuscate email addresses.
Listing 1 shows how you can specify
an address in the Head of a HTML page,
and then use the document.write()
JavaScript function to output the
address. This variant means that the
website visitor still sees the address in
cleartext on the page, but a specialized
search will not come up with the goods.
There are various options for assigning
the email-address to the variable. The
easiest -- and maintainer-friendliest -- is
loading the value from an external
javascript file. Harvesters do not cur-
rently evaluate this type of file.
Unfortunatly, some browsers only load
external javascripts after having dis-
played the page. In this case,
document.write will fail, as it tries to
output an empty variable. Its value is
assigned to late.
Also, a cunning spammer might find a
method to harvest all the JavaScript files
on a page, and then search these files for
email addresses.
There is a workaround for the docu-
ment.write() problem that affects some
browsers. Instead of using a HTML link
&#117;&#115;&#101;&#114;&#064; U
&#101;&#120;&#097;&#109;&#112; U
&#108;&#101;&#46;&#099; U
&#111;&#109
Browsers have no trouble reading the
address, but harvesters fail to recognize
the searched pattern held within the
source code.
In the CDT report, addresses encoded
in this way did not receive any spam.
Our primitive harvester found 10
addresses of this kind when we ran it.
As the use of this format is on the
increase, it is to be expected that spider
programs will soon be capable of auto-
matically converting entity addresses
back to valid email addresses. In the
long term, the entity address has no real
advantage over cleartext for protection
against harvesters.
GLOSSARY
Heuristics: (From Greek “heuriskein”: to find,
discover) Searching for patterns based on
rules of thumb that have a high probability of
success. Theoretically this makes the results
unreliable, but the search operation is a lot
quicker than a precise computation.
SMTP: The “Simple Mail Transfer Protocol”
accepts and forwards email messages.
robots.txt: Files of this name on Web servers
contain information on the pages that search
engines should not automatically search,
among other things.
ASCII: The “American Standard Code for Infor-
mation Interchange” assigns a number, or
ASCII code, to all letters, numbers and special
characters.
JavaScript: Script language for use on web-
sites. If JavaScript is enabled for a browser, the
browser interprets the language and executes
the embedded commands.
XOR: “Exclusive OR” is a computational
method common in binary math, and repre-
sented by the ^ symbol in JavaScript. It can be
used for symmetrical data encryption.
HTML entity: This HTML encoding prints char-
acters in a format with a combination of the
&# prefix, followed by the ASCII code and a
semicolon at the end: thus “&#117” corre-
sponds to “u”.
www.linux-magazine.com September 2004
27
564403183.008.png 564403183.009.png
COVER STORY
Spam-proof Homepages
Figure 2: It is worthwhile removing existing email addresses from a homepage.
der programs for harvesting
addresses do not possess.
You can easily combine the
JavaScript snippets listed in this
article, adding the encryption
facilities from Listing 3 to the
JavaScript link in Listing 2, for
example.
It would be great to be able to
say that only address harvesters
trip up over JavaScript, but
unfortunately text-based brow-
sers like Lynx have the same
problem. Also, many users dis-
able JavaScript in graphical
browsers for security reasons.
Note that JavaScript pages will
prevent these users from getting
in touch with you.
( <a href=“mailto:someone@example.
com”> ) to point directly to the mail
address, you can have JavaScript take
care of the job. Instead of using
JavaScript to output HTML, Listing 2
shows the JavaScript link function, docu-
ment.location.href .
Extremely basic XOR encryption is all
it takes to prevent a search program from
finding a mail address. This encryption
method provides very little in the line of
security from a cryptographic point of
view, but it is extremely effective in the
case of spammers who need maximum
results in as short a time as possible. The
address uses an encrypted format. Doing
so requires time and brain, both is not
available to spammers respectivly their
harvesters. The spammer would need to
understand, and reverse engineer, the
JavaScript commands to obtain a clear-
text address.
Listing 3 below, encrypts the user
name in a mail address, that is the name
component before the @ ; the document.
location.hostname JavaScript function
retrieves the missing elements in clear
text from the browser address line. This
example only works if the server domain
matches the domain for the email
address. As an alternative, you could use
the same algorithm below to encrypt the
remaining address components.
The encryption procedure is easily
extensible, however, both the encryption
procedure and the key are available
within the script. This allows anyone
who can run the JavaScript code to view
the mail address in the clear, and of
course this is necessary to allow human
visitors to read the address.
Downsides
The obfuscation methods described so
far work for email addresses and Web
links. If these techniques were applied
globally, address harvesters would have
a hard time of it: on the downside, so
would search engines like Google, and
human surfers without JavaScript-
capable browsers.
To avoid forcing the visitors to your
website to use JavaScript, you might like
to use a simple trick, which unfortu-
nately again rules out users with
text-only browsers. Include an image file
on the page that displays an image of
your email address. As the address does
not show up in the text content of the
page, harvesters have no chance of
gleaning it. Spammers can not resort to
OCR software to automatically harvest
addresses from images – after all, any
image file on the Web could contain a
mail address. This also provides a loop-
hole for website owners faced with
a legal requirement to publish an elec-
tronic mail address.
You can link an image without reveal-
ing an email address: use a contact form
without a visible target address to for-
ward legitimate messages from website
visitors to the site owner.
Big Advantages
The biggest advantage that this method
offers is that clients that can not interpret
JavaScript will not recognize the email
address. At present (and let’s hope it
stays that way), this is an ability that spi-
Listing 2: JavaScript with address links
01 <HTML>
02 <HEAD>
03 <TITLE>Sample Page</TITLE>
04 <SCRIPT LANGUAGE="JavaScript">
05 <!--
06 mailaddress =
'user@example.com';
07 function mailMe()
08 {
09
10 }
11 //-->
12 </SCRIPT>
13 </HEAD>
14 <BODY>
15 [...]
16 <A
HREF="javascript:mailMe();">Ma
il sender</A>
17 [...]
18 </BODY>
19 </HTML>
GLOSSARY
Flash: This proprietary format by Macrome-
dia supports combinations of animation,
video, sound, and images on websites.
Browsers need a plug-in to view content in
Flash format.
document.location.href="mailto
:"+mailaddress;
28
September 2004
www.linux-magazine.com
564403183.001.png
Spam-proof Homepages
COVER STORY
A Flash animation that displays a
clickable address is yet another variant.
However, you should be aware that this
approach will exclude more visitors.
cover the origins of address harvesters.
To do so, create an email link that is con-
tinually updated on the page – use the
time of day, the date and the IP address
of the current user to update the link,
thus forcing visitors to supply their email
addresses when they need to load your
page.
If this address is spammed, you can
then just read the source address for the
unsolicited message from the address
that they have sent to – and this can be
important, if you need to go to
litigatation against the spammers.
Hitting Back
If you have your own Internet domain,
you can use a dynamic website to dis-
Tobias Eggendorfer
has been working as a
freelance IT consul-
tant and lecturer since
1999. Tobias’s main
focus is IT security,
spam prevention, net-
working technologies,
and databases.
In his leisure time Tobias works as a
volunteer for the local ambulance
service, particularly in crisis interven-
tion. His homepage can be found at
http://www.eggendorfer.info.
Listing 3: Encrypted mail address
01 <HTML>
02 <HEAD>
03 <TITLE>Sample page</TITLE>
04 <SCRIPT LANGUAGE="JavaScript">
05 <!--
06 local = new Array
(194,196,210,197);
07 local_part = '';
08 for (i=0;
09 i<local.length;
10 local_part +=
String.fromCharCode(local[i] ^
183), i++) ;
11 mailaddress = local_part +
String.fromCharCode(64) +
document.location.hostname;
12 //-->
13 </SCRIPT>
14 </HEAD>
15 <BODY>
16 [...]
17 <SCRIPT LANGUAGE="JavaScript">
18 <!--
19 document.write('<A
HREF="mailto:'+mailaddress+'">
'+mailaddress+'</A>');
20 //-->
21 </SCRIPT>
22 [...]
23 </BODY>
24 </HTML>
INFO
[1] “Why am I getting all this spam?”:
http://www.cdt.org/speech/spam/
030319spamreport.html
[2] robots.txt: http://www.robotstxt.org/
SMASH THE 2.0TB BARRIER
THE TERAVAULT STORAGE SERVERS from Digital
Networks provide complete networked storage of up to
9600GB in size.
Linux servers have previously been limited to 2.0TB
filesystems. The Teravault RS5240-64, pictured right,
features 64-bit AMD Opteron processors and a 2.6
kernel based 64-bit Linux distribution to overcome the
2.0TB filesystem limit. 9600GB in raw storage equates
to approximately 8800GB available to the network -
Nearly 9TB of network storage - on one filesystem
on
one filesystem.
The Teravault RS5240-64 also features multiple Gigabit
Ethernet interfaces. Linux, UNIX, Windows and Apple
clients are supported, and the system can be
administered remotely with the included web based
interface or by SSH.
Teravault RS5240-64
• 9600GB RAID storage and LVM2
• Dual 64-bit PCI-X hardware RAID controllers
• Single or dual 64-bit AMD Opteron processors
• Up to 16.0GB of RAM
• 2.6 kernel based 64-bit Linux distribution
From £13,999 + VAT (storage servers from £2200)
From now on network attached storage needn’t cost an
arm and a leg. For details, visit www.dnuk.com
www.dnuk.com
sales@dnuk.com
0161 337 8555
564403183.002.png 564403183.003.png 564403183.004.png 564403183.005.png
Zgłoś jeśli naruszono regulamin