帮朋友抓取微信公众平台的用户评论信息。
下面只说核心的部分,怎么获取评论信息。
查看HTML代码,没有发现关于评论部分的标签。看来是用JS动态生成的,但是查找ajax请求也没有找到哪里有返回数据。
最后搜索一下,原来是在这里,很直白的写在了JS里:
- <script type="text/javascript">
- wx.cgiData = {
- total_count : 91,
- latest_msg_id : '200325222',
- count : "20"*1 || 20,
- day : "7",
- frommsgid : "",
- can_search_msg : "1",
- offset : "",
- action : "",
- keyword : "",
- list : ({"msg_item":[{"id":200322761,"type":1,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854675,"content":"记得帮我查一下是不是这个电话!","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322760,"type":2,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854664,"source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322759,"type":1,"fakeid":"593656935","nick_name":"Suang 1","date_time":1398854659,"content":"勐璇,我看到那人了!","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200322344,"type":2,"fakeid":"1994400010","nick_name":"ABC的CBA","date_time":1398839849,"source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200321209,"type":1,"fakeid":"1591078101","nick_name":"倚(纺织服装)","date_time":1398788906,"content":"\/::<","source":"","msg_status":4,"has_reply":0,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},{"id":200321206,"type":2,"fakeid":"1591078101","nick_name":"倚(纺织服装)","date_time":1398788859,"source":"","msg_status":4,"has_reply":1,"refuse_reason":"","multi_item":[],"to_uin":3071594631,"send_stat":{"total":0,"succ":0,"fail":0}},
用的是JSON格式,代码太乱,放在Eclipse里格式化一下,消息列表大概就是这个样了:
- {"msg_item" :[ {
- "id" : 200322761,
- "type" : 1,
- "fakeid" : "593656935",
- "nick_name" : "Suang 1",
- "date_time" : 1398854675,
- "content" : "记得帮我查一下是不是这个电话!",
- "source" : "",
- "msg_status" : 4,
- "has_reply" : 0,
- "refuse_reason" : "",
- "multi_item" : [],
- "to_uin" : 3071594631,
- "send_stat" : {
- "total" : 0,
- "succ" : 0,
- "fail" : 0
- }
- }, {
- "id" : 200322760,
- "type" : 2,
- "fakeid" : "593656935",
- "nick_name" : "Suang 1",
- "date_time" : 1398854664,
- "source" : "",
- "msg_status" : 4,
- "has_reply" : 0,
- "refuse_reason" : "",
- "multi_item" : [],
- "to_uin" : 3071594631,
- "send_stat" : {
- "total" : 0,
- "succ" : 0,
- "fail" : 0
- }
- }
- ]
- }
上面就是 json字符串 中 msg_item 所对应的列表里的对象。
可以看出这个是一个数组,每个评论是里面的一个对象。怎么生成对于的Java类呢 ?
这里有一个在线的工具:http://jsongen.byingtondesign.com/
可以根据JSON 字符串,生成对应的java类:
类1
- import java.util.List;
-
- public class MessageList{
- private List<Message> msg_item;
-
- public List<Message> getMsg_item() {
- return msg_item;
- }
-
- public void setMsg_item(List<Message> msgItem) {
- msg_item = msgItem;
- }
-
- }
类2。部分字段没有用,删掉了
- public class Message {
-
- private String content;
- private long date_time;
- private String fakeid;
- private int has_reply;
- private long id;
- private int msg_status;
- private String nick_name;
- private String refuse_reason;
- private String source;
- private long to_uin;
- private int type;
- // get set 略去
- }
下面来做个测试。用google的 Gson 来进行处理,把json字符串解析为 java对象。
- //jsonstr 为 msg_item 的json字符串
- MessageList msgList = new Gson().fromJson(jsonstr, MessageList.class);
- System.out.println(msgList.getMsg_item().size());
解析成功。所有的对象都在 msgList里了
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请
点击举报。