Python: Using requests to get web page lengths and status codes

With Python 3.6 and the

requests
requests module, it is easy to read data from the web. Here are a couple of basic things we can do with the
requests
requests module.

Getting a Status code

import requests
r = requests.get('http://www.bluegalaxy.info')
print( r.status_code )
import requests r = requests.get('http://www.bluegalaxy.info') print( r.status_code )
import requests

r = requests.get('http://www.bluegalaxy.info')
print( r.status_code )

Which yields:

200
200
200

For more information about HTTP status codes, there are multiple websites where their meanings are detailed. For example:
http://www.restapitutorial.com/httpstatuscodes.html

A status code of ‘200’ means that the website is up and running.

Getting page length

import requests
r = requests.get('http://www.bluegalaxy.info')
print( len(r.content) )
import requests r = requests.get('http://www.bluegalaxy.info') print( len(r.content) )
import requests

r = requests.get('http://www.bluegalaxy.info')
print( len(r.content) )

Which yields:

586
586
586

The length of the content here is simply the number of characters in the source of the page. For example, if you took all of the page source and placed it into a string, this is a measurement of the length of the string.

Here is a bigger example:

import requests
sites = [
'http://www.python.org',
'http://www.jython.org',
'http://www.pypy.org',
'http://www.drudgereport.com',
'http://www.youtube.com',
'http://www.phys.org',
'http://www.bluegalaxy.info',
'http://www.bluegalaxy.info/codewalk'
]
for url in sites:
r = requests.get(url)
print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \
.format(url, r.status_code, len(r.content)))
import requests sites = [ 'http://www.python.org', 'http://www.jython.org', 'http://www.pypy.org', 'http://www.drudgereport.com', 'http://www.youtube.com', 'http://www.phys.org', 'http://www.bluegalaxy.info', 'http://www.bluegalaxy.info/codewalk' ] for url in sites: r = requests.get(url) print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \ .format(url, r.status_code, len(r.content)))
import requests

sites = [
    'http://www.python.org',
    'http://www.jython.org',
    'http://www.pypy.org',
    'http://www.drudgereport.com',
    'http://www.youtube.com',
    'http://www.phys.org',
    'http://www.bluegalaxy.info',
    'http://www.bluegalaxy.info/codewalk'
]

for url in sites:
    r = requests.get(url)
    print("URL: {0:40} Status code: {1:4} \t Page length: {2:8}" \
          .format(url, r.status_code, len(r.content)))

Which yields:

URL: http://www.python.org Status code: 200 Page length: 48761
URL: http://www.jython.org Status code: 200 Page length: 19210
URL: http://www.pypy.org Status code: 200 Page length: 6133
URL: http://www.drudgereport.com Status code: 200 Page length: 31485
URL: http://www.youtube.com Status code: 200 Page length: 516073
URL: http://www.phys.org Status code: 400 Page length: 91
URL: http://www.bluegalaxy.info Status code: 200 Page length: 586
URL: http://www.bluegalaxy.info/codewalk Status code: 200 Page length: 74648
URL: http://www.python.org Status code: 200 Page length: 48761 URL: http://www.jython.org Status code: 200 Page length: 19210 URL: http://www.pypy.org Status code: 200 Page length: 6133 URL: http://www.drudgereport.com Status code: 200 Page length: 31485 URL: http://www.youtube.com Status code: 200 Page length: 516073 URL: http://www.phys.org Status code: 400 Page length: 91 URL: http://www.bluegalaxy.info Status code: 200 Page length: 586 URL: http://www.bluegalaxy.info/codewalk Status code: 200 Page length: 74648
URL: http://www.python.org                    Status code:  200          Page length:    48761
URL: http://www.jython.org                    Status code:  200          Page length:    19210
URL: http://www.pypy.org                      Status code:  200          Page length:     6133
URL: http://www.drudgereport.com              Status code:  200          Page length:    31485
URL: http://www.youtube.com                   Status code:  200          Page length:   516073
URL: http://www.phys.org                      Status code:  400          Page length:       91
URL: http://www.bluegalaxy.info               Status code:  200          Page length:      586
URL: http://www.bluegalaxy.info/codewalk      Status code:  200          Page length:    74648

For more information about the requests module see:
http://docs.python-requests.org/en/master/

Leave a Reply