BigSnarf blog

Infosec FTW

Extracting features out of web logs to identify Human vs. Robot


Classifying traffic intensity and temporary differences in access

  1. Total pages request per IP address
  2. Percentage of images requested
  3. Percentage of binaries requested like pdf
  4. Total request for robots.txt
  5. Percentage of HTML pages requested
  6. Percentage of text files requested
  7. Percentage of zip files requested
  8. Percentage of video files requested
  9. Bounce rate
  10. Session time
  11. Standard deviation between clicks
  12. Percentage of night time requests
  13. Percentage of errors
  14. Percentage of garbage requests
  15. Percentage of GETS
  16. Percentage of POSTS
  17. Percentage of HEAD
  18. URL traversal
  19. Depth of URL traversal
  20. Pathlength
  21. Referrer
  22. User Agents
  23. IP Address location
  24. Known crawler IP addresses
  25. Repeated requests
  26. Average time between clicks
  27. OS badges
  28. ARIN registration
  29. ASN analysis
  30. Geolocation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: