流逝的是岁月,不变的是情怀.
坚持学习,是为了成就更好的自己.
公众号[中关村程序员]

# Requests 源码阅读-Day5

# Get方法

Session对象实例化后指向session,接着调用了其内部方法request:

从例子分析:

[jian@laptop requests]$ cat test.py 
import requests
url = "http://www.baidu.com"
resp = requests.get(url)

然后到requests/__init__.py 看到是导入api.py里面的get方法

from .api import request, get, head, post, patch, put, delete, options

继续到api.py找到get方法

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)

从request方法的注释中可以看出,该方法的作用是构造Request对象,准备并发送它,最后返回Response对象。

# Request

我们查看一下Request:

def request(method, url, **kwargs):
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

为什么要这样做呢?

Session对象具有保留参数的功能,支持持久性的cookies以及urllib3的连接池功能

当我们向同一主机发送多个请求的时候,底层的TCP连接将会被重用,从而带来显著的性能提升,同时也为我们节省了很多工作量,不必为每次请求都去设置参数。

但是,不是每一次请求都需要保持长连接,保留参数,因为这会带来资源释放失败的风险,所以在我们常规方法中,引入了with 语句确保Session对象的正常退出。

通过代码演示一下两者的区别:

>>> s = requests.session()
>>> s.get("http://httpbin.org/cookies/set/sessioncookie/123456789")
<Response [200]>
>>> r = s.get("http://httpbin.org/cookies")
>>> print(r.text)
{
    "cookies": {
        "sessioncookie": "123456789"
    }
}

>>> requests.get("http://httpbin.org/cookies/set/sessioncookie/123456789")
<Response [200]>
>>> r = requests.get("http://httpbin.org/cookies")
>>> print(r.text)
{
    "cookies": {}
}

着重看下session.request方法: sessions.py

class Session(SessionRedirectMixin):
    ...
    def request(self, method, url, ...):
        ...
        # Create the Request.
        req = Request(
            method=method.upper(),
            url=url,
            headers=headers,
            files=files,
            data=data or {},
            json=json,
            params=params or {},
            auth=auth,
            cookies=cookies,
            hooks=hooks,
        )
        prep = self.prepare_request(req)
        ...
... 

再继续研究Request, 其实是导入models.py里面的Request

from .models import Request, PreparedRequest, DEFAULT_REDIRECT_LIMIT

继续往里面看 models.py

class Request(RequestHooksMixin):
    """A user-created :class:`Request <Request>` object.
    Used to prepare a :class:`PreparedRequest <PreparedRequest>`, which is sent to the server.
    ...
    """

    def __init__(self,
            method=None, url=None, headers=None, files=None, data=None,
            params=None, auth=None, cookies=None, hooks=None, json=None):
	...

类Request继承了类RequestHooksMixin,类RequestHooksMixin提供了hooks事件注册与注销的接口方法。

初始化方法实现了hooks事件注册,然后又是一波参数设置。

类Request对象是为后面的类PreparedRequest对象创建做准备。

Request对象实例构造完成后,继续调用了prepare_request方法:

sessions.py

class Session(SessionRedirectMixin):
    """A Requests session.
    Provides cookie persistence, connection-pooling, and configuration.
    ...
    """
    
    ...
    def prepare_request(self, request):
        """Constructs a :class:`PreparedRequest <PreparedRequest>` for
        transmission and returns it. The :class:`PreparedRequest` has settings
        merged from the :class:`Request <Request>` instance and those of the
        :class:`Session`.
        ...

prepare_request(self, request)方法的作用是构造用于传输的PreparedRequest对象并返回它。

那么PreparedRequest对象是如何构建的?

它是由Request实例对象与Session对象中的数据(如cookies,stream,verify,proxies等等)合并而来。

为什么参数分别放在Request对象与Session对象中呢?

猜测与Session的参数持久化与连接池等有关,可以充分利用之前请求时存储的参数。

PreparedRequest对象构造完成后,最后再通过send方法将其发送出去:

sessions.py

class Session(SessionRedirectMixin):
    """A Requests session.
    Provides cookie persistence, connection-pooling, and configuration.
    ...
    """
    ...
    def request(self, method, url, ...):
        """Constructs a :class:`Request <Request>`, prepares it and sends it.
        Returns :class:`Response <Response>` object.
        ...
        :rtype: requests.Response
        """
        ...
        resp = self.send(prep, **send_kwargs)
        return resp
        ...
    
        ...
        def send(self, request, **kwargs):
        """Send a given PreparedRequest.

        :rtype: requests.Response
        """
        ...
        # Get the appropriate adapter to use
        adapter = self.get_adapter(url=request.url)
				
        ...
        
        # Send the request
        r = adapter.send(request, **kwargs)
        ...
        
        return r

send方法:

def send(self, request, **kwargs):
        """Send a given PreparedRequest.

        :rtype: requests.Response
        """
        # Set defaults that the hooks can utilize to ensure they always have
        # the correct parameters to reproduce the previous request.
        kwargs.setdefault('stream', self.stream)
        kwargs.setdefault('verify', self.verify)
        kwargs.setdefault('cert', self.cert)
        kwargs.setdefault('proxies', self.proxies)

        # It's possible that users might accidentally send a Request object.
        # Guard against that specific failure case.
        if isinstance(request, Request):
            raise ValueError('You can only send PreparedRequests.')
           ...
           ...
            

send方法接收PreparedRequest对象,然后根据该对象的url参数获取对应的传输适配器。

这个传输适配器就是我们前面讲的HTTPAdapter,它的底层由强大的urllib3库实现,为requests提供了的HTTP和HTTPS全部接口方法,包括send等。

当send方法将请求发送给服务器后,等待服务器的响应,最后返回Response对象。

这里又涉及到NotImplementedError以及生成器的一些用法

到这里,一个完整的HTTP请求完成了。

总之:

sessions.py是最大的难点,因为里面包含了很多东西

上次更新: 7/20/2020, 2:09:44 AM