打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
java判断百度云分享链接是否失效

 我不知道现在有多少人在用网盘搜索引擎,但就去转盘网来说本人倾注了很多的心血,现在使用的人数也还可以,网盘资源都有个通病,那就是资源可能失效,但很多引擎都没有做失效判断,尤其是一些google自定义的引擎,技术含量不高,站长也就花心思赚钱,很少考虑用户体验。这篇文章是本人又一篇技术公开博客,之前本人已经公开了去转盘

网的几乎所有的技术细节,这一篇继续补充:

      首先做个回顾:百度网盘爬虫  java分词算法 数据库自动备份 代理服务器爬取 邀请好友注册

 1 ing:utf-8 2 """ 3 @author:haoning 4 @create time:2015.8.5 5 """ 6 from __future__ import division  # 精确除法 7 from Queue import Queue 8 from __builtin__ import False 9 from _sqlite3 import SQLITE_ALTER_TABLE10 from collections import OrderedDict11 import copy12 import datetime13 import json14 import math15 import os16 import random17 import platform18 import re19 import threading, errno, datetime20 import time21 import urllib222 import MySQLdb as mdb23 24 25 DB_HOST = '127.0.0.1'26 DB_USER = 'root'27 DB_PASS = 'root'28 29 30 def gethtml(url):31     try:32         print "url",url33         req = urllib2.Request(url)34         response = urllib2.urlopen(req,None,8) #在这里应该加入代理35         html = response.read()36         return html37     except Exception,e:38         print "e",e39 40 if __name__ == '__main__':41 42    while 1:43        #url='http://pan.baidu.com/share/link?uk=1813251526&shareid=540167442'44        url="http://pan.baidu.com/s/1qXQD2Pm"45        html=gethtml(url)46        print html

结果:e HTTP Error 403: Forbidden,这就是说,度娘他是反爬虫的,之后看了很多网站,一不小心试了下面的链接:

http://pan.baidu.com/share/link?uk=1813251526&shareid=540167442

1 if __name__ == '__main__':2 3    while 1:4        url='http://pan.baidu.com/share/link?uk=1813251526&shareid=540167442'5        #url="http://pan.baidu.com/s/1qXQD2Pm"6        html=gethtml(url)7        print html

结果:<title>百度云 网盘-链接不存在</title>,你懂的,有这个的必然已经失效,看来度娘没有反爬虫,好家伙。

其实百度网盘的资源入口有两种方式:

一种是:http://pan.baidu.com/s/1qXQD2Pm,最后为短码。

另一种是:http://pan.baidu.com/share/link?uk=1813251526&shareid=540167442,关键是shareId+uk 前者已知道反爬虫,后者目前没有,所以用python测试后,本人又将代码翻译成了java,因为去转盘是用java写的,直接上代码:

  1 package com.tray.common.utils;  2   3 import static org.junit.Assert.*;  4   5 import java.io.BufferedReader;  6 import java.io.IOException;  7 import java.io.InputStream;  8 import java.io.InputStreamReader;  9 import java.net.HttpURLConnection; 10 import java.net.MalformedURLException; 11 import java.net.URL; 12 import java.util.HashMap; 13 import java.util.Iterator; 14 import java.util.Map; 15 import java.util.Properties; 16 import java.util.Random; 17 import java.util.Set; 18  19 import org.jsoup.Jsoup; 20 import org.jsoup.nodes.Document; 21 import org.jsoup.select.Elements; 22 import org.junit.Test; 23  24 /** 25  * 资源校验工具 26  *  27  * @author hui 28  *  29  */ 30 public class ResourceCheckUtil { 31     private static Map<String, String[]> rules; 32     static { 33         loadRule(); 34     } 35  36     /** 37      * 加载规则库 38      */ 39     public static void loadRule() { 40         try { 41             InputStream in = ResourceCheckUtil.class.getClassLoader() 42                     .getResourceAsStream("rule.properties"); 43             Properties p = new Properties(); 44             p.load(in); 45             Set<Object> keys = p.keySet(); 46             Iterator<Object> iterator = keys.iterator(); 47             String key = null; 48             String value = null; 49             String[] rule = null; 50             rules = new HashMap<String, String[]>(); 51             while (iterator.hasNext()) { 52                 key = (String) iterator.next(); 53                 value = (String) p.get(key); 54                 rule = value.split("\\|"); 55                 rules.put(key, rule); 56             } 57         } catch (Exception e) { 58             e.printStackTrace(); 59         } 60     } 61  62     public static String httpRequest(String url) { 63         try { 64             URL u = new URL(url); 65             Random random = new Random(); 66             HttpURLConnection connection = (HttpURLConnection) u 67                     .openConnection(); 68             connection.setConnectTimeout(3000);//3秒超时 69             connection.setReadTimeout(3000);  70             connection.setDoOutput(true); 71             connection.setDoInput(true); 72             connection.setUseCaches(false); 73             connection.setRequestMethod("GET"); 74              75             String[] user_agents = { 76                     "Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11", 77                     "Opera/9.25 (Windows NT 5.1; U; en)", 78                     "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)", 79                     "Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Kubuntu)", 80                     "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12", 81                     "Lynx/2.8.5rel.1 libwww-FM/2.14 SSL-MM/1.4.1 GNUTLS/1.2.9", 82                     "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.04 Chromium/16.0.912.77 Chrome/16.0.912.77 Safari/535.7", 83                     "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0 " 84             }; 85             int index=random.nextInt(7); 86             /*connection.setRequestProperty("Content-Type", 87                     "text/html;charset=UTF-8");*/ 88             connection.setRequestProperty("User-Agent",user_agents[index]); 89             /*connection.setRequestProperty("Accept-Encoding","gzip, deflate, sdch"); 90             connection.setRequestProperty("Accept-Language","zh-CN,zh;q=0.8"); 91             connection.setRequestProperty("Connection","keep-alive"); 92             connection.setRequestProperty("Host","pan.baidu.com"); 93             connection.setRequestProperty("Cookie",""); 94             connection.setRequestProperty("Upgrade-Insecure-Requests","1");*/ 95             InputStream in = connection.getInputStream(); 96  97             BufferedReader br = new BufferedReader(new InputStreamReader(in, 98                     "utf-8")); 99             StringBuffer sb = new StringBuffer();100             String line = null;101             while ((line = br.readLine()) != null) {102                 sb.append(line);103             }104             return sb.toString();105 106         } catch (MalformedURLException e) {107             e.printStackTrace();108         } catch (IOException e) {109             e.printStackTrace();110         }111 112         return null;113     }114 115      @Test116      public void test7() throws Exception {117          System.out.println(isExistResource("http://pan.baidu.com/s/1jGjBmyq",118          "baidu"));119          System.out.println(isExistResource("http://pan.baidu.com/s/1jGjBmyqa",120          "baidu"));121         122          System.out.println(isExistResource("http://yunpan.cn/cQx6e6xv38jTd","360"));123          System.out.println(isExistResource("http://yunpan.cn/cQx6e6xv38jTdd",124          "360"));125         126          System.out.println(isExistResource("http://share.weiyun.com/ec4f41f0da292adb89a745200b8e8b57","weiyun"));127          System.out.println(isExistResource("http://share.weiyun.com/ec4f41f0da292adb89a745200b8e8b57dd",128          "360"));129         130          System.out.println(isExistResource("http://cloud.letv.com/s/eiGLzuSes","leshi"));131          System.out.println(isExistResource("http://cloud.letv.com/s/eiGLzuSesdd",132          "leshi"));133      }134 135     /**136      * 获取指定页面上标签的内容137      * 138      * @param url139      * @param tagName140      *            标签名称141      * @return142      */143     private static String getHtmlContent(String url, String tagName) {144         String html = httpRequest(url);145         if(html==null){146             return "";147         }148         Document doc = Jsoup.parse(html);149         //System.out.println("doc======"+doc);150         Elements tag=null;151         if(tagName.equals("<h3>")){ //针对微云152             tag=doc.select("h3");153         }154         else if(tagName.equals("class")){ //针对360155             tag=doc.select("div[class=tip]");156         }157         else{158             tag= doc.getElementsByTag(tagName);159         }160         //System.out.println("tag======"+tag);161         String content="";162         if(tag!=null&&!tag.isEmpty()){163             content = tag.get(0).text();164         }165         return content;166     }167 168     public static int isExistResource(String url, String ruleName) {169         try {170             String[] rule = rules.get(ruleName);171             String tagName = rule[0];172             String opt = rule[1];173             String flag = rule[2];174             /*System.out.println("ruleName"+ruleName);175             System.out.println("tagName"+tagName);176             System.out.println("opt"+opt);177             System.out.println("flag"+flag);178             System.out.println("url"+url);*/179             String content = getHtmlContent(url, tagName);180             //System.out.println("content="+content);181             if(ruleName.equals("baidu")){182                 if(content.contains("百度云升级")){ //升级作为不存在处理183                     return 1;184                 }185             }186             String regex = null;187             if ("eq".equals(opt)) {188                 regex = "^" + flag + "$";189             } else if ("bg".equals(opt)) {190                 regex = "^" + flag + ".*$";191             } else if ("ed".equals(opt)) {192                 regex = "^.*" + flag + "$";193             } else if ("like".equals(opt)) {194                 regex = "^.*" + flag + ".*$";195             }else if("contain".equals(opt)){196                 if(content.contains(flag)){197                     return 0;198                 }199                 else{200                     return 1;201                 }202             }203             if(content.matches(regex)){204                 return 1;205             }206         } catch (Exception e) {207             e.printStackTrace();208         }209         return 0;210     }211 212     // public static void main(String[] args)throws Exception {213     // final Path p = Paths.get("C:/Users/hui/Desktop/6-14/");214     // final WatchService watchService =215     // FileSystems.getDefault().newWatchService();216     // p.register(watchService, StandardWatchEventKinds.ENTRY_MODIFY);217     // new Thread(new Runnable() {218     //219     // public void run() {220     // while(true){221     // System.out.println("检测中。。。。");222     // try {223     // WatchKey watchKey = watchService.take();224     // List<WatchEvent<?>> watchEvents = watchKey.pollEvents();225     //226     // for(WatchEvent<?> event : watchEvents){227     // //TODO 根据事件类型采取不同的操作。。。。。。。228     // System.out.println("["+p.getFileName()+"/"+event.context()+"]文件发生了["+event.kind()+"]事件");229     // }230     // watchKey.reset();231     //232     // } catch (Exception e) {233     // e.printStackTrace();234     // }235     // }236     // }237     // }).start();238     // }239     240 //    @Test241 //    public void testName() throws Exception {242 //        System.out.println(new String("\u8BF7\u8F93\u5165\u63D0\u53D6\u7801".getBytes("utf-8"), "utf-8"));243 //    }244 245 }

注意代码本生要用来兼容360,微盘等网盘的,但有些网盘倒了,大家都知道,不过代码还是得在,这才是程序猿该有的思路,那就是可宽展,注意代码有个配置文件,我也附上吧:

360=class|contain|\u5206\u4EAB\u8005\u5DF2\u53D6\u6D88\u6B64\u5206\u4EAB
baidu=title|contain|\u94FE\u63A5\u4E0D\u5B58\u5728
weiyun=<h3>|contain|\u5206\u4EAB\u8D44\u6E90\u5DF2\u7ECF\u5220\u9664
leshi=title|ed|\u63D0\u53D6\u6587\u4EF6

sorry,unicode编码,麻烦你自己转下码吧,不会请百度:unicode转码工具

到此,去转盘网链接是否失效的验证,代码我已经完全公开,喜欢这篇博客的孩子请收藏并关注下。

本人建个qq群,欢迎大家一起交流技术, 群号:512245829 喜欢微博的朋友关注:转盘娱乐即可

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
java发送http的get、post请求
Java发送http get/post请求,调用接口/方法
Java发HTTP POST请求(内容为xml格式)
Java发送http请求并为http设置头信息
java 发送post请求
Java爬虫的底层获取模块,构造POC和漏洞检测时常用爬虫抓取或发送测试代码查询目标站 ...
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服