Wednesday, December 5, 2012

Screen Scraper in Python: article 201206

As part of the SecurityTube Python Scripting Expert course the below is a simple script written to scrape the Top X suspect IP addresses from SANS Internet Storm Center.

Written in Python 2.7.2, Beautiful Soup 4, and LXML parser

#!/usr/bin/python

import urllib
import re
import sys
from bs4 import BeautifulSoup
print "\n\n"
print "++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n"
print """
        The following list of IP's is pulled from
        the SANS Internet Storm Center.  It shows
        a list of up to the top 100 IP's from which
        suspected malicous traffic was seen. It is
        not recommended to use this as a black list.
        source: http://isc.sans.edu/sources.html
        """
print "\n++++++++++++++++++++++++++++++++++++++++++++++++++++++++\n"

topX = int(raw_input("Enter the Top X amount, btw 1 & 100, of flagged IP's you want to see: "))

while ((topX < 1) or (topX > 100)):
        print "The ammount must be a number btw 1 & 100\n"
        topX = int(raw_input("Enter the Top X amount, btw 1 & 100, of flagged IP's you want to see: "))

print "\nPlease be patient\n"
print "Retrieving the top ", topX , " IPs\n"

iscPage = urllib.urlopen("http://isc.sans.edu/sources.html")

#print iscPage.code

iscSoup = BeautifulSoup(iscPage.read(), "lxml")

allAtag = iscSoup.find_all('a')

counter = 0


for item in allAtag:
        if (re.search('ipinfo', str(item)) and (counter < topX)):
                        print item.string
                        counter = counter + 1
print "\n"

No comments:

Post a Comment