TechF1x
V2EX  ›  问与答

Python 爬虫

  •  
  •   TechF1x · Oct 7, 2017 · 2414 views
    This topic created in 3150 days ago, the information mentioned may be changed or developed.

    爬取 zealer 的的图可以,但爬 Apple 官网的图失败?是被反爬了,需要进行伪装? 1 import urllib2 2 req = urllib2.urlopen('https://www.apple.com/cn/iphone-x/') 3 buf = req.read() 4 5 import re 6 listurl = re.findall(r'https:.+.jpg',buf) 7 print listurl 8 9 10 i = 0 11 for url in listurl: 12 f = open(str(i)+'.jpg',"wb") 13 req = urllib2.urlopen(url) 14 buf = req.read() 15 f.write(buf) 16 i = i+1 ~

    5 replies    2017-10-07 11:56:32 +08:00
    coderluan
        1
    coderluan  
       Oct 7, 2017
    需要进行伪装,先加 header 和 cookie,不行再上代理试试
    crab
        2
    crab  
       Oct 7, 2017
    Apple 官网应该没反爬虫,只是 http 请求协议头严格些。
    windfarer
        3
    windfarer  
       Oct 7, 2017 via Android
    记得加 referer
    TechF1x
        4
    TechF1x  
    OP
       Oct 7, 2017
    @coderluan ok,谢谢,试试
    TechF1x
        5
    TechF1x  
    OP
       Oct 7, 2017
    @windfarer 好的
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2464 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 66ms · UTC 01:08 · PVG 09:08 · LAX 18:08 · JFK 21:08
    ♥ Do have faith in what you're doing.