打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
新浪微博爬虫遇到的cookie rejected 问题解决办法

最近做了个新浪微博爬虫,用到了httpclient-4.3.3,程序运行的很好,就是一直会出现 cookie rejected警告,日志如下:


2014-06-05 10:27:17.417 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ef542aa2.538fd58b.ec8a8e2c", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:23 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"2014-06-05 10:27:17.422 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ef632aa2.538fd58b.c6dd669e", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"登录成功,昵称:佩佩菜_523502014-06-05 10:27:20.019 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.75d37a79.538fd58d.077976a4", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:25 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"2014-06-05 10:27:20.019 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.75e37a79.538fd58d.575a338c", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"登录成功,昵称:通吃一条街呵呵2014-06-05 10:27:29.119 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.9fcc12df.538fd597.fcf0e3af", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:35 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"2014-06-05 10:27:29.120 [main] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.9fd812df.538fd597.e804e263", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "passport.weibo.com"登录成功,昵称:dxedflog4j:WARN No appenders could be found for logger (com.mchange.v2.log.MLog).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - 读取系统配置:D:\Workspaces\eurlanda\DAP_EurlandaSpider\WebRoot\WEB-INF\classes\config.properties2014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.weibo.dely=122014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.task.saveDely=12014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.task.dely=1682014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.retryCount=32014-06-05 10:27:30.247 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.work_thread_num=102014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.readTimeout=52014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.serverPort=70772014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.core.socket.connectTimeout=52014-06-05 10:27:30.248 [Thread-0] INFO  com.eurlanda.spider.global.Global - jspider.work.schedule=* * 18-9 ? * 1-5|* * * ? * 1,7|* * * * * ?2014-06-05 10:27:30.254 [Thread-0] INFO  c.e.s.c.sina_weibo.SinaWeiBoCrawler - ----------- 抓取日期2010-02-23 00:00:00的数据-----------2014-06-05 10:27:30.869 [18721437752] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.ae2f61ad.538fd599.2711e9ab", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:30.870 [18721437752] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.ae3b61ad.538fd599.cec3bfae", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:30.881 [zjweii@qq.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.18d93dd.538fd599.add86b40", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:30.882 [zjweii@qq.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.18ee3dd.538fd599.d7522db2", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:31.089 [18721437752] INFO  c.e.s.c.sina_weibo.SinaWeiBoClient - 搜索无结果。2014-06-05 10:27:31.280 [pbz201402@126.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS1="000000ea.39486d50.538fd599.66e98262", version:0, domain:.sina.com.cn, path:/, expiry:Sun Jun 02 10:27:37 CST 2024] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"2014-06-05 10:27:31.280 [pbz201402@126.com] WARN  o.a.h.c.p.ResponseProcessCookies - Cookie rejected [U_TRS2="000000ea.395a6d50.538fd599.84218ee8", version:0, domain:.sina.com.cn, path:/, expiry:null] Illegal domain attribute "sina.com.cn". Domain of origin: "s.weibo.com"

今天实在看不下去了,在网上找一大片资料,大部分是过期的或者版本跟不上,各种尝试整理之后,找到了解决办法,其实是cookie策略的问题,重写默认的策略验证就OK了。

CookieSpecProvider easySpecProvider = new CookieSpecProvider() {    public CookieSpec create(HttpContext context) {        return new BrowserCompatSpec() {            @Override            public void validate(Cookie cookie, CookieOrigin origin)                    throws MalformedCookieException {                // Oh, I am easy            }        };    }};Registry<CookieSpecProvider> reg = RegistryBuilder.<CookieSpecProvider>create()        .register(CookieSpecs.BEST_MATCH,            new BestMatchSpecFactory())        .register(CookieSpecs.BROWSER_COMPATIBILITY,            new BrowserCompatSpecFactory())        .register("mySpec", easySpecProvider)        .build();RequestConfig requestConfig = RequestConfig.custom()        .setCookieSpec("mySpec")        .build();CloseableHttpClient httpclient = HttpClients.custom()        .setDefaultCookieSpecRegistry(reg)        .setDefaultRequestConfig(requestConfig)        .build();
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
iOS App 自动登录的安全性分析
cookie rejected:...domain must start with a dot解决办法
[Javascript] 爬虫 模拟新浪微博登陆
Selenium2 python自动化41-绕过验证码(add_cookie)
python selenium操作cookie
python3.7爬虫:使用Selenium带Cookie登录并且模拟进行表单上传文件
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服