Watch, Follow, &
Connect with Us

For forums, blogs and more please visit our
Developer Tools Community.


Welcome, Guest
Guest Settings
Help

Thread: How can i kick off this: $/start/robots.txt


This question is answered. Helpful answers available: 1. Correct answers available: 1.


Permlink Replies: 1 - Last Post: Feb 9, 2016 10:14 AM Last Post By: Daniel Fields
Jose Nilton Pace


Posts: 122
Registered: 5/15/98
How can i kick off this: $/start/robots.txt  
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 9, 2016 4:24 AM
I would like to kick off this situation of BOTs.

My log with 404:

2016-02-07 15:52:16 GET /online/sf.dll/$/start/robots.txt - 443 - 66.249.65.177 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 404 0 0 203

ThankYou.
Daniel Fields

Posts: 622
Registered: 11/29/04
Re: How can i kick off this: $/start/robots.txt
Helpful
Click to report abuse...   Click to reply to this thread Reply
  Posted: Feb 9, 2016 10:14 AM   in response to: Jose Nilton Pace in response to: Jose Nilton Pace
You have three options to protect yourself from bots and crawlers. You may have to use more than one technique if you are being scanned a lot.

1. Use a robots.txt file. The bots and crawlers are supposed to respect the settings in robots.txt. This file can specify which services can index your site. This works in theory, but it relies upon the services be honorable. It does not enforce any real protection. Here is a link describing its structure and use: http://www.robotstxt.org/robotstxt.html.

2. User ServerController.SearchEngineOptions. I have not used this techinique, but it has been documented here http://docs.atozed.com/docs.dll/development/Creating%20Custom%20Content%20Handlers.html.

3. Add code to the ServerController OnBaseBrowserCheck event. That is the first opportunity you have to act on bots and crawlers. You can start simple and use text recognition to identify the engines that you want to block. I wrote some code that I have used for a while. The example below shows how I use it. You will see references to MakeSessionRec(), which is code that logs sessions into a table. You can replace those calls with your own code to record the bot activity. I record it so I can monitor and respond to it.

procedure TIWServerController.IWServerControllerBaseBrowserCheck(aSession: TIWApplication; var rBrowser: TBrowser);
begin
  if (rBrowser is TSearchEngine) then
  begin
    msg := 'Indexing of this resource by search engines is not allowed!';
    try
      MakeSessionRec('Search Engine',msg);  //my code to record session information
      aThreadSessionRec.Resume;
    finally
      caasExts.IncBotsBlocked;  //my code to record how many sessions have been blocked
      ASession.Terminate('403 Forbidden.  '+msg);
      rBrowser.destroy;
      rBrowser := TInternetExplorer.Create(9);
    end;
  end
  else if caasExts.IsKnownBot(uas,UserAgent.AgentType) then
  begin
    msg := 'Indexing of this resource is not allowed!';
    try
      MakeSessionRec('Bot/Spider/Crawler',msg);  //my code to record session information
      aThreadSessionRec.Resume;
    finally
      caasExts.IncBotsBlocked;  //my code to record how many sessions have been blocked
      ASession.Terminate('403 Forbidden.  '+msg);
      rBrowser.destroy;
      rBrowser := TInternetExplorer.Create(9);
    end;
  end
end;
 
function caasServerControllerExt.IsKnownBot(uas,uatype : string) : boolean;
begin
  result := (Pos('baidu',uas) > 0)
            or (Pos('yandex',uas) > 0)
            or (Pos('naverbot',uas) > 0)
            or (Pos('yeti',uas) > 0)
            or (Pos('seznambot',uas) > 0)
            or (Pos('slurp',uas) > 0)
            or (Pos('teoma',uas) > 0)
            or (Pos('moget',uas) > 0)
            or (Pos('ichiro',uas) > 0)
            or (Pos('sogu',uas) > 0)
            or (Pos('bot',uas) > 0)
            or (Pos('spider',uas) > 0)
            or (Lowercase(uatype)='robot');
end;
Legend
Helpful Answer (5 pts)
Correct Answer (10 pts)

Server Response from: ETNAJIVE02