本帖最后由 引子玄 于 2012-7-18 15:36 编辑 问:EXCEL 如何获取网站多页数据?地址 http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book1.htm ================== 答: 先通过QueryTables获取分页URL 得出分页URL为: http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book1.files/sheet001.htm http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book2.files/sheet001.htm http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book3.files/sheet001.htm http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book4.files/sheet001.htm http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book5.files/sheet001.htm 再做循环处理: 先QueryTables宏录制代码为: Sub Macro1() With ActiveSheet.QueryTables.Add(Connection:= _ "URL;http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book1.files/sheet001.htm", _ Destination:=Range("A1")) .Name = "sheet001" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlAllTables .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With End Sub 然后修改: 1、如果按行植入,代码为 Sub 翻动网页按行植入() x = 0 Do x = x + 1 With ActiveSheet.QueryTables.Add(Connection:= _ "URL;http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book" & x & ".files/sheet001.htm", _ Destination:=Cells(60 * (x - 1) + 1, 1)) .Name = "sheet001" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlAllTables .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Loop Until x = 5 End Sub 2、如果按列植入,代码为 Sub 翻动网页按列植入() x = 0 Do x = x + 1 With ActiveSheet.QueryTables.Add(Connection:= _ "URL;http://www.sdzs.gov.cn/score/gaokao/2012/0715/Book" & x & ".files/sheet001.htm", _ Destination:=Cells(1, 12 * (x - 1) + 1)) .Name = "sheet001" .FieldNames = True .RowNumbers = False .FillAdjacentFormulas = False .PreserveFormatting = True .RefreshOnFileOpen = False .BackgroundQuery = True .RefreshStyle = xlInsertDeleteCells .SavePassword = False .SaveData = True .AdjustColumnWidth = True .RefreshPeriod = 0 .WebSelectionType = xlAllTables .WebFormatting = xlWebFormattingNone .WebPreFormattedTextToColumns = True .WebConsecutiveDelimitersAsOne = True .WebSingleBlockTextImport = False .WebDisableDateRecognition = False .WebDisableRedirections = False .Refresh BackgroundQuery:=False End With Loop Until x = 5 End Sub ==================== QueryTables网页抓取的批量处理,最主要的是解决好URL和Range两个基本变量的问题。 学会对这两个变量的处理,那么网页抓取的门槛,难度将随之降低、、、、、、 千万别小瞧QueryTables,帮助寻找URL作用可大呢! QueryTables结合F12键和IE功能,这样那样的隐藏看不见的URL,统统可以擒入囊中。(没有啥么装这装那工具的繁琐,只要一个IE浏览器就可以),而且对页面看得见的内容,统统都可以实现导入(直接或间接)。 QueryTables真可谓“只要看得见、就能抓得到”,是一款适众、通用、简易的抓取武器。 |
联系客服