
推荐理由:
-
官方介绍:(很强大!)
“Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks.
Things shouldn’t be this way. Not in Python.”
-
stackoverflow的问题Should I use urllib or urllib2 or requests?
也是推荐它的!
用起来非常不错哦。 经常抓网页的可以考虑下,抓取效率有10%的提升。
源码位置 :https://github.com/kennethreitz/requests
常用功能罗列如下
-
认证、状态码、header、编码、json
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
-
发起请求
import requests
URL="http://www.bsdmap.com/"
r = requests.get(URL)
r = requests.post(URL)
r = requests.put(URL)
r = requests.delete(URL)
r = requests.head(URL)
r = requests.options(URL)
-
通过URL传递参数
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get("http://httpbin.org/get", params=payload)
>>> print r.url
u'http://httpbin.org/get?key2=value2&key1=value1'
-
返回内容
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.text
'[{"repository":{"open_issues":0,"url":"https://github.com/...
>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'
-
二进制内容
You can also access the response body as bytes, for non-text requests:
>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...
The gzip and deflate transfer-encodings are automatically decoded for you.
For example, to create an image from binary data returned by a request,
ou can use the following code:
>>> from PIL import Image
>>> from StringIO import StringIO
>>> i = Image.open(StringIO(r.content))
-
JSON
>>> import requests
>>> r = requests.get('https://github.com/timeline.json')
>>> r.json()
[{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
-
超时
>>> requests.get('http://github.com', timeout=0.001)
-
自定义header
>>> import json
>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}
>>> headers = {'content-type': 'application/json'}
>>> r = requests.post(url, data=json.dumps(payload), headers=headers)
更多见官方文档:
http://docs.python-requests.org/en/latest/user/quickstart/
http://docs.python-requests.org/en/latest/user/advanced/#advanced