http://blog.csdn.net/yueguanghaidao/article/details/26449911
2014
记得以前写爬虫的时候为了防止dns多次查询,是直接修改/etc/hosts文件的,最近看到一个优美的解决方案,修改后记录如下:
- import socket
-
- _dnscache={}
- def _setDNSCache():
- """
- Makes a cached version of socket._getaddrinfo to avoid subsequent DNS requests.
- """
-
- def _getaddrinfo(*args, **kwargs):
- global _dnscache
- if args in _dnscache:
- print str(args)+" in cache"
- return _dnscache[args]
-
- else:
- print str(args)+" not in cache"
- _dnscache[args] = socket._getaddrinfo(*args, **kwargs)
- return _dnscache[args]
-
- if not hasattr(socket, '_getaddrinfo'):
- socket._getaddrinfo = socket.getaddrinfo
- socket.getaddrinfo = _getaddrinfo
-
- def test():
- _setDNSCache()
- import urllib
- urllib.urlopen('http://www.baidu.com')
- urllib.urlopen('http://www.baidu.com')
-
- test()
结果如下:
- ('www.baidu.com', 80, 0, 1) not in cache
- ('www.baidu.com', 80, 0, 1) in cache
不过这个方案虽好,但也有缺陷,罗列如下:
1.相当于只对socket.getaddrinfo打了一个patch,但socket.gethostbyname,socket.gethostbyname_ex还是走之前的策略
2.只对本程序有效,而修改/etc/hosts将对所有程序有效,包括ping
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请
点击举报。