使用python批量检查url的有效性

来源：互联网发布：虚拟网络交易平台编辑：程序博客网时间：2024/06/10 07:50

因为工作需要，之前用python写了一些批量校验url有效性的小脚本，但并不全面，健壮性较差，现把之整理一下，代码如下：

#!/usr/bin/python# -*- coding:utf-8 -*-import urllib2from urllib2 import URLErrorresult_url=[]count=0not_200=0f=open("img1.txt","r")img_not_200=open("img_not_200.txt","w+")for line in f:    count+=1    print "on scanning ",count    try:    response=urllib2.urlopen(line)    except URLError, e:    if hasattr(e,'reason'): #stands for URLError    print "can not reach a server,writing..."    result_url.append(line)    not_200+=1    img_not_200.write(line)    print "write url success!"    elif hasattr(e,'code'): #stands for HTTPError    print "find http error, writing..."    result_url.append(line)    not_200+=1    img_not_200.write(line)    print "write url success!"    else: #stands for unknown error    print "unknown error, writing..."    result_url.append(line)    not_200+=1    img_not_200.write(line)    print "write url success!"    else:    #print "url is reachable!"    #else 中不用再判断 response.code 是否等于200,若没有抛出异常，肯定返回200,直接关闭即可    response.close()    finally:    passprint "scanning over,total",count,"; did not response 200:",not_200f.close()img_not_200.close()

对这段代码解析如下：

如果url有效，则可以正常通过urlopen取到response，并且response.getcode()等于200；

但若url无效，无论是无法找到服务器还是其他http错误，都无法通过urlopen返回response。这个时候，就需要通过返回的错误类型来判断错误到底是url错误还是http错误。上面的程序是通过错误类型所拥有的属性来判断的。如果错误类型有“code”属性，则代表错误是HTTPError；如果属性有“reason”，则代表是URLError错误。

当然，也可以在except中分别指定抛出的错误类型，进而进行不同的处理。所要注意的是，因为HTTPError是URLError的子类，所以必须在第一个except中指定捕获HTTPError，第二个except中指定捕获URLError，否则的话，你懂的。。

0 0