Eigencovers (Python + PCA + PIL + Pandas + NumPy)

Eigencovers (Python + PCA + PIL + Pandas + NumPy)

Our eyes are super ADHD and do not have enough attention to focus on every object in the field and gather their information; therefore, we need to select what is important to process. So our little human eye is super busy ALL the time.

Partly due to the limited focal range of the eye and the fact that It's a small small small camera, but luckily its linked to a super powerful computer of a brain to process and fill in the gaps to produce what we call peripheral vision. The quick movement between the focus points is called a saccade. Thus what we see does not depend entirely on what is out there but also to a considerable extent on what the brain computes to be most probably out there.

Imagine your brain is a TV. It would make sense to send out every single detail of the picture for every pixel of the TV. TV engineers did this exactly, its called the I programme, BUT this is super computationally expensive and takes lots of processing power. You wouldn't want to update every pixel for every frame. So they also invented something called a P programme. P programme calculates the most probable scenes following a scene based on the system's memory and its previous experience of such scene sequences. So what you watch is a mixture of I and P. Thats what our crazy brain does too!

The brain processes from the back of your brain to the top of your top brain or from the back of your brain along the left bottom side of your brain. Going though layers to bring out different features of what we are seeing - motion, color, etc. So can we visualize this?

Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.

My dataset is from ~100 Time magazine covers a mixture of the top and worst covers (Could prolly do more if memory allowed. will look into pickling and hashing, in another post). Image sizes is 400x 538.  

Predictive Model (Logistic Regression) 

Built a logistic regression model to find the most and least attractive, interesting, and similar pca components. Interesting finds. Based on TIME's self published Best and Worst magazine covers list. 

Top 5 Most Attractive. Mostly realistic paintings. Fully decorated framing. and an outlier of Nixon. 

Top 5 Most Attractive. Mostly realistic paintings. Fully decorated framing. and an outlier of Nixon. 

Top 5 ugliest. Mainly from the 90s and crowded. 

Top 5 ugliest. Mainly from the 90s and crowded. 

Top 5 Least Interesting. Head shot of people's left side. Hrmm... 

Top 5 Least Interesting. Head shot of people's left side. Hrmm... 

Top 5 Most Interesting. Odd collection of art and pastel colors. 

Top 5 Most Interesting. Odd collection of art and pastel colors. 

Top 5 Least similar PCA componentsLots of blacks and whites. 

Top 5 Least similar PCA componentsLots of blacks and whites. 

Top 5 Most similar PCA components. 

Top 5 Most similar PCA components. 

So now that we have feature layers extracted. Its super easy now to recreate covers from random PCA layers and get brand new covers. Just interesting to look at. So many features for our eyes to bounce around from. But all nonetheless having the TIME font at the top. Some things do stay constant over the years. 

Sources:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1079232/pdf/00980018.pdf

https://en.wikipedia.org/wiki/Visual_cortex

http://setosa.io/ev/principal-component-analysis/

Code derived from Joel Grus' blog

An attempt at fivethirtyeight's CARMELO model applied to NHL Forward Clustering (Python + Pandas + NumPy + Seaborn + Bokeh)

Scraping the Best Time Magazine Covers (Python + Beautiful Soup + urllib)

Scraping the Best Time Magazine Covers (Python + Beautiful Soup + urllib)