Python: Using requests to get web page lengths and status codes

With Python 3.6 and the requests module, it is easy to read data from the web. Here are a couple of basic things we can do with the requests module.

Getting a Status code

import requests

r = requests.get('http://www.bluegalaxy.info')
print( r.status_code )

Which yields:

200

For more information about HTTP status codes, there are multiple websites where their meanings are detailed. For example:
http://www.restapitutorial.com/httpstatuscodes.html

A status code of ‘200’ means that the website is up and running.

Getting page length

import requests

r = requests.get('http://www.bluegalaxy.info')
print( len(r.content) )

Which yields:

586

The length of the content here is simply the number of characters in the source of the page. For example, if you took all of the page source and placed it into a string, this is a measurement of the length of the string.

Here is a bigger example:

import requests

sites = [
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.drudgereport.com',
    'http://www.youtube.com',
    'http://www.phys.org',
    'http://www.bluegalaxy.info',
    'http://www.bluegalaxy.info/codewalk'
]

for url in sites:
    r = requests.get(url)
    print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \
          .format(url, r.status_code, len(r.content)))

Which yields:

URL: http://www.python.org                    Status code:  200          Page length:    48761
URL: http://www.jython.org                    Status code:  200          Page length:    19210
URL: http://www.pypy.org                      Status code:  200          Page length:     6133
URL: http://www.drudgereport.com              Status code:  200          Page length:    31485
URL: http://www.youtube.com                   Status code:  200          Page length:   516073
URL: http://www.phys.org                      Status code:  400          Page length:       91
URL: http://www.bluegalaxy.info               Status code:  200          Page length:      586
URL: http://www.bluegalaxy.info/codewalk      Status code:  200          Page length:    74648

For more information about the requests module see:
http://docs.python-requests.org/en/master/

Leave a Reply