Internetseer is the devil

I apologize to everyone that got an email from internetseer reporting that had broken links. Actually, I restructured part of my website when I moved it to my Linux server, and they are comparing old links to links that have moved.

My advice for email from internetseer, and all spam for that matter, delete it and NEVER reply back. Spammers will use your reply as acknowledgement that your account is active.

In response to this, I have made all blog comment fields optional. Your name is nice, but I usually know your email address. If you would like to leave an email address for me to get back to you, please use an address that is not your work address. For example, I use my yahoo address for all web related fields, and protect my other accounts from spammers by only giving them to friends.

Again my apologies.

5 Replies to “Internetseer is the devil”

  1. The best way to rid yourself of and block Internet Seer AKA: InternetSeer etc. etc. is to block it via your .htacess file but use their IP range to deny access. In other words:

    RewriteEngine on
    RewriteCond %{REMOTE_ADDR} ^66.150.40 [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet Seer
    RewriteRule !^http://[^/.]* – [F]

    The above will deny server access to Internet Seer and will send it off as a 403 error. I don’t trust InternetSeer as it DOES NOT respect robots.txt and it also looks for other things besides “pinging” for up time. What hits our servers is not a “ping”, just another creepy bit of cyberspace junk.

  2. There are two other very serious webmaster concerns out there besides Internet Seer. The two areas are the following bots:




    The first one belongs to and this bot will, without your consent or permission, totally duplicate your site on their server. They try to pass it off as some type of a cyber history lesson, but the legal fact is that they are ripping off the graphics and content of sites in violation of copyright laws.

    The second bot, TurnitinBot/1.5 – owned by – esentially does the same thing as ia_archiver except the content is privatly held on their servers and then used FOR PROFIT by them. They charge money for schools and others to compare works turned in by students vs web content. The problem is, of course, is that is retaining information under copyright for commercial benefit.

    In all three cases, they do not appear to honor robots.txt and in two – possibly all three – your copyrighted material may be used for their profit.

    There is another bad bot out there called BaiDuSpider from China – that wonderful location that you can trace half of all your spam back to.

    This is what I recommend for denile of these bots and content siphons via .htaccess since the bots disregard robots.txt

    RewriteEngine on
    RewriteCond %{REMOTE_ADDR} ^66.150.40 [OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet Seer
    RewriteCond %{HTTP_USER_AGENT} ^BaiDuSpider [OR]
    RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot/1.5 [OR]
    RewriteCond %{HTTP_USER_AGENT} ^ia_archiver
    RewriteRule !^http://[^/.]* – [F]

    If placed in the .htaccess file it will keep these devil bad bots and siphons off of your site. Please be very carefull with .htaccess and talk to your web hosting service if you have any questions about changing this file!

    The best thing to do is use the IP rewrite to combat InternetSeer as it will change IP and agent but using the open ended ^66.150.40 will nail everything in their netblock that goes to down to .99

  3. We have found, at our law office, that there are several devil agents out there. Of course, InternetSeer is one of them. The other two agents that we have found to violate various user
    agreements (UA’s) and the DMCA are ia_archiver and Netcraft Survey.

    At first glance you may not really feel that these agents violate your web space but take a closer look. First of all Netcraft Survey agent is a “computer security company” that attempts to drive traffic to their site by showing what server YOU are running. To be specific, they are hitting your site hundereds to thousands of times each month with bandwidth YOU pay for to promote their business. There are some thoughts about the agent hits that end up in logs (below). Some experts feel that there is the possibility that this is a spam agent or spam bot – especially since abuse contact e-mails listed in whois records are apparently fake. In any case, if you think it’s ok for a company like Netcraft to use your bandwidth that you pay for as a way to promote their business. Let them in… If not, there are two ways to deal with this evil doer.

    A example Netcraft “hit”

    Referer: Agent: Mozilla/4.0 (compatible; Netcraft Web Server Survey)

    One of the best ways to to ask your technical people or host to shunt them at the gate. Several companies we work for actually turn them on their own servers. For a “security for hire firm” they don’t seem to be very bright. Our law firm’s server on Netcraft is actually, Netcraft’s own server! LOL! Security firm?

    The second best thing to do to companies like this is to take their bandwidth – make sure that they get nothing and that it costs them for even trying. Here is a sample of how you can do this:

    RewriteEngine on
    RewriteCond %{REMOTE_ADDR} ^195.92.95. [OR]
    RewriteCond %{HTTP_USER_AGENT} netcraft [NC]
    RewriteRule ^.*$ [R]

    Translation of above – Go make money off of your own bandwith sucker!

    Important Netcraft note: We would highly recommend running a block of all of Energis and Planet OnLine IP’s from the UK – most of the junk hits that our American clients receive are from the 192., 193., and 194. ranges. It is very important, before you block a range is to review your client and marketing base. We recommend that if you don’t sell to Europe, Asia or Latin America, that you block these IP’s from the start. Why expose yourself to network abuse when this is not your market area? (Many would be very surprised to learn that most abuse comes from Europe – NOT Asia or Latin America.)

    Then there is the case of ia_archiver that bandwidth sucker that seems to grab everything you have in spite of your copyright notices and the DCMA and then displays them on and since the “collector” has no privacy notices on what they are doing with the data, who knows where it ends up! We should make a very interesting legal point here – If you read the copyright and legal notices at if you did to them what they do to your site it would be a violation of THEIR notice. Go figure!

    The ia_archiver story does not end with as the information is being collected by a firm called “United Layer”. I think we should point out that United Layer takes a lot of bandwidth and a lot of information from the World Wide Web without having any listed policy on what they do with this information. In addition to ia_archiver we have logs from clients that have thousands of hits from browsers that appear normal, but take a lot of information and hits in hundreds to thousands of visits each month (aka BAD BOT). We advise all clients to block all of United Layer’s IP’s until they post a policy on what they do with their “information”. obtained at the expense of your bandwidth.

    A United Layer cake walk off the plank would look like:

    RewriteEngine on
    RewriteCond %{REMOTE_ADDR} ^209.237.224. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.225. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.226. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.227. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.228. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.229. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.230. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.231. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.233. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.234. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.235. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.236. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.237. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.238. [OR]
    RewriteCond %{REMOTE_ADDR} ^209.237.239. [OR]
    RewriteRule ^.*$ – [F]

    Translation of the above: Where is your privacy policy United Layer? (Or… Don’t use my bandwith for???????).

    Of course, last but not least there is the subject of this posting. Our friends at InternetSeer. Yes our spam friendly boyz and galz at the spam hole. We sent them a UA violation notice and they responded. The message was trapped as a “known source of spam”. Ha! Ha! Here are some key Google and whois search patterns you need to take a close look at when getting rid of the devil InternetSeer. Oh! There are some names here that are not InternetSeer but they go to the belong same spam pool (all aka: InternetSeer):

    Internap Network Services

    If you are using rewrite rules to take out only InternetSeer, forget it. They come in in all forms and all shapes to suck your bandwidth and email addresses. It is best to do them in via their IP ranges:

    RewriteEngine on
    RewriteCond %{REMOTE_ADDR} ^64.94.206. [OR]
    RewriteCond %{REMOTE_ADDR} ^66.150.41. [OR]
    RewriteCond %{REMOTE_ADDR} ^66.150.42. [OR]
    RewriteCond %{REMOTE_ADDR} ^66.150.43. [OR]
    RewriteRule ^.*$ – [F]

    A word of CAUTION!!! Make sure that you check with your hosting company before aletering your htaccess files as there a a couple of ways to write them depending on your server configuration. sometime a “” is required before a “.” in IP ranges. There are also shorter ways to write the IP range for United Layer, but I put it up to show the designated IP’s for United Layer.

    The best thing to do is to download your logs every month and take a close look at them. Do you see a “normal” browser hitting every page on your site in a period of a minute or two. Em… Smells like a bot. Do a reverse IP and ban them if they are up to no good.

    The only way that you can get these people who freeload on your bandwidth to stop is to make it costly for them or block them from your site! Why should you pay so they can gain? I would also like to thank the web host for the other fine posts here. Many people think a wesite is put it up and forget it. You’d be surprised! Many a technical department spends more time
    keeping people OFF the server than driving people to their respective sites!

  4. I was looking at the information on ia_archiver and internetseer. I was curious if anyone still sees these bots on their sites? About the same time that ia_archiver and some of these other bots disappaeared, I started to get hammered with this thing called (two, three or four times a month) and you will see it in your log like:” “SurveyBot/2.3 (Whois Source)”

    Although it uses a .sc domain all of the ip’s used trace to a location in Seattle Washington. In addition, the site appears to have lifted copyrighted information in the same way that ia_archiver did. Checking them out I also read information that it does not obey robots.txt standards. If you look at the post below it sucked a lot of mb’s off of someones site. It looks to me like a new form of ia_archiver. Here are their IP’s that I will block:
    and all ip’s owned by the below

    The other information is:

    Compass Communications, Inc. (CPCM)
    Compass Communications, Inc. CCOM-2001 (NET-66-228-192-0-1) –
    Compass Communications, Inc. CCOM (NET-208-192-40-0-1) –
    Compass Communications, Inc. UU-208-222-176 (NET-208-222-176-0-1) –
    Compass Communications, Inc. NETBLK-CCOM-1998 (NET-216-145-0-0-1) –
    Compass Communications, Inc. CCOM-2003 (NET-64-246-160-0-1) –
    OrgName: Compass Communications, Inc.

    OrgName: Compass Communications, Inc.
    OrgID: CPCM
    Address: 2001 6th Avenue
    Address: Suite 3205
    City: Seattle
    StateProv: WA
    PostalCode: 98121
    Country: US
    RegDate: 1996-05-12
    Updated: 2003-05-12

    AdminHandle: IC122-ARIN
    AdminPhone: +1-206-777-9988

    TechHandle: IC122-ARIN
    TechPhone: +1-206-777-9988

    The strange thing about this is if you try to find a domain that will “resolve” to any of the ip’s listed on a search for compass comunications, none of them match. The only one that resolves correctly is so the home office looks a bit suspect. Also, I could not find privacy or DMCA info, but I may not have looked very hard. I think I will block everything from all ip’s above unless somebody posts additional information that changes my mind. This looks really suspect to me.

    Other information I find connected to these people were a posting on a US GOVERNMENT SITE and this:

    Knut Gr?neng

    Fri, 22 Nov 2002 20:58:27 +0100


    RE: [WG-Users] Site in WG reports.

    11/22/02 20:54:26 IP block
    Trying at ARIN
    Trying 216.145.5 at ARIN

    OrgName: Compass Communications, Inc.
    OrgID: CPCM

    NetRange: –
    NetName: NETBLK-CCOM-1998
    NetHandle: NET-216-145-0-0-1
    Parent: NET-216-0-0-0-0
    NetType: Direct Allocation
    NameServer: NS1.CCOM.NET
    NameServer: NS2.CCOM.NET
    RegDate: 1998-12-10
    Updated: 2002-08-07

    TechHandle: IC122-ARIN
    TechPhone: +1-206-777-9988

    # ARIN Whois database, last updated 2002-11-21 19:05
    # Enter ? for additional hints on searching ARIN’s Whois database.

    Knut Gr?neng
    Senior advisor

    Hands ASA, Brynsengveien 10, P.O. Box 6534 Etterstad, N-0606 Oslo, Norway.

    > Original Message
    > From: Hutchinson, Mark []
    > Sent: Friday, November 22, 2002 8:52 PM
    > To: ‘Mike Newell’; ‘Briggs, Bruce’;
    > Subject: RE: [WG-Users] Site in WG reports.
    > My traceroutes (ran four from different locations) show BBN
    > Planet as the
    > most likely owner. I’d contact and ask them.
    > Mark Hutchinson
    > Network Adminstrator
    > NYS SBDC
    > Original Message
    > From: Mike Newell []
    > Sent: Friday, November 22, 2002 2:45 PM
    > To: ‘Briggs, Bruce’;
    > Subject: RE: [WG-Users] Site in WG reports.
    > Here’s what I see in my report.
    > Domain Bytes Transferred Percentage Hits
    > 109.94 MB 4.331% 16,303
    > Original Message
    > From: Briggs, Bruce []
    > Sent: Friday, November 22, 2002 11:45 AM
    > To: Mike Newell;
    > Subject: RE: [WG-Users] Site in WG reports.
    > and the IP addr is?????
    > Original Message
    > From: Mike Newell []
    > Sent: Friday, November 22, 2002 2:31 PM
    > To:
    > Subject: [WG-Users] Site in WG reports.
    > Hello,
    > I have a site showing up in the WG reports that won’t resolve
    > to a name, no
    > matter what tool I use to find it. When I open it in a
    > browser it says I
    > don’t have permissions to view the page. Is there a way to
    > find out what
    > this site is? It’s the number one site in my reports @ over 100MB
    > transferred.
    > I can see that a company called Compass Communications hosts
    > it but I would
    > like to know what’s eating up all that bandwidth.
    > Thanks again,
    > Mike.

    In any case, any other information, like who is would be appreciated as it looks really odd to me.

  5. ia_runs on the legal pricipal that a cache is fair use. Has anybody sued google for keeping a cached version of their page? No! IA follows standard data exclusion techniques. If you fail to make any reasonable attempt to keep your data from being archived, you deserve whatever you get.

Leave a Reply

Your email address will not be published. Required fields are marked *