作为python新手,更是一名scrapy的菜鸟。在这里简单的吐槽Scrapy官方文档两句,对于官方文档的指南更多的需要是参考而不是照搬,如果你放弃了前者选择了后者,就会像我一样,在前期阶段超级想放弃Scrapy的学习,还真是没入门就想放弃,特别的尴尬。
windows下强烈建议使用 Anaconda 这个第三方软件安装
Anaconda和文档下载地址:.html
注:如果你不小心已经使用 pip install scrapy
但是报错了,那么骚年别慌!还有补救的方法:
(1)《pip安装scrapy所需要的所有依赖》
(2)《pip install twisted 报错看这个》
linux下载安装就会简单很多,至少不会有一些很奇怪的错误,试着使用pip install scrapy
就可以了,如果遇到问题上面的几个方法就可以解决
启动项目:scrapy crawl spider_name
报错如下:
D:mepythonscrapydoubandoubanspiders>scrapy crawl douban_spider
2019-01-15 14:34:28 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: douban)
2019-01-15 14:34:28 [scrapy.utils.log] INFO: Versions: lxml 4.3.0.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.6.5rc1 (v3.6.5rc1:f03c5148cf, Mar 14 2018, 03:12:11) [MSC v.1913 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0j 20 Nov 2018), cryptography 2.4.2, Platform Windows-10-10.0.16299-SP0
2019-01-15 14:34:28 [awler] INFO: Overridden settings: {'BOT_NAME': 'douban', 'DOWNLOAD_DELAY': 0.5, 'NEWSPIDER_MODULE': 'douban.spiders', 'SPIDER_MODULES': ['douban.spiders']}
2019-01-15 14:34:28 [scrapy.middleware] INFO: Enabled extensions:
[stats.CoreStats',lnet.TelnetConsole',sions.logstats.LogStats']
2019-01-15 14:34:29 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware','scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware','scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware','scrapy.downloadermiddlewares.useragent.UserAgentMiddleware','RetryMiddleware','direct.MetaRefreshMiddleware','scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware','direct.RedirectMiddleware','kies.CookiesMiddleware','scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware','scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-01-15 14:34:29 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware','scrapy.spidermiddlewares.offsite.OffsiteMiddleware','ferer.RefererMiddleware','scrapy.spidermiddlewares.urllength.UrlLengthMiddleware','scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-01-15 14:34:29 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2019-01-15 14:34:29 [ine] INFO: Spider opened
2019-01-15 14:34:29 [sions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-01-15 14:34:29 [lnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2019-01-15 14:34:30 [ine] DEBUG: Crawled (403) <GET ; (referer: None)
2019-01-15 14:34:30 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 ;: HTTP status code is not handled or not allowed
2019-01-15 14:34:30 [ine] INFO: Closing spider (finished)
2019-01-15 14:34:30 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 221,'downloader/request_count': 1,'downloader/request_method_count/GET': 1,'downloader/response_bytes': 248,'downloader/response_count': 1,'downloader/response_status_count/403': 1,'finish_reason': 'finished','finish_time': datetime.datetime(2019, 1, 15, 6, 34, 30, 269071),'httperror/response_ignored_count': 1,'httperror/response_ignored_status_count/403': 1,'log_count/DEBUG': 2,'log_count/INFO': 8,'response_received_count': 1,'scheduler/dequeued': 1,'scheduler/dequeued/memory': 1,'scheduler/enqueued': 1,'scheduler/enqueued/memory': 1,'start_time': datetime.datetime(2019, 1, 15, 6, 34, 29, 539961)}
2019-01-15 14:34:30 [ine] INFO: Spider closed (finished)
解决方案:(找了好久,各种瞎改settings.py
中的配置,最终解决方案还是在慕课网找到了,慕课网棒棒哒!!!)
《scrapy crawl splider_name 报错 看这个视频》
pycharm的基本配置和使用以及将scrapy项目导入到pycharm中可以参考如下文档,我就不重复造轮子了:
本文发布于:2024-01-28 08:49:53,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/17064029966233.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |