Scraping the Best Time Magazine Covers (Python + Beautiful Soup + urllib)

Scraping the Best Time Magazine Covers (Python + Beautiful Soup + urllib)

So Time.com doesn't make it easy to get their Time covers very easily. If you were to do this manually, you would go to here then click on any of the years your would want then you could download a thumbnail image from there, but if you want the full cover then you need to click "See Larger Cover" then download the image from there. Then repeat that 83 times. That would take forever. Well this script automates all that to just over a minute on my slow macbook. It grabs all the links from the first pages, then all the links from each individual picture then all the links to the "See Larger Cover" then downloads all the images to a specific folder. Enjoy! 

from bs4 import BeautifulSoup 
import urllib

r = urllib.urlopen('http://content.time.com/time/specials/packages/completelist/0,29569,1704183,00.html', "lxml").read()
soup = BeautifulSoup(r)

# Grab links to individual covers links
links = []
for ultag in soup.find_all('ul', {'class': 'items'}):
    for litag in ultag.find_all('li'):
        links.append(litag)
cover = []
prefix = "http://content.time.com"
for element in links:
    cover.append(prefix + element.a["href"])


# Grab links to bigger picture
ilink = []
for link in cover:
i = urllib.urlopen(link, "lxml")
soup2 = BeautifulSoup(i)
for link in soup2.find_all('a', href=True, text='See Larger Cover'):
ilink.append(prefix + link['href'])

# Grab title and img src from page
images = []
title = []
for link in ilink:
i = urllib.urlopen(link, "lxml")
soup3 = BeautifulSoup(i)
title.append("YOUR DIRECTORY" + soup3.find("title").get_text() + ".jpg")
a = soup3.find("article", class_="art-cover-photo").find("img").get("src")
images.append(a)

# Put it into a dicitonary
final = zip(images, title)

# Save the data
for images, title in final:
urllib.urlretrieve(images, title)
Eigencovers (Python + PCA + PIL + Pandas + NumPy)

Eigencovers (Python + PCA + PIL + Pandas + NumPy)

Drop out and Biology, seriously they are related.... (TensorFlow + Python + Machine Learning + Dropout + Adam Optimizer)

Drop out and Biology, seriously they are related.... (TensorFlow + Python + Machine Learning + Dropout + Adam Optimizer)