One of my new years resolutions of 2016 was to learn to code a bit, and with just over a week to go I feel like I’ve been able to produce an idea for a piece of code, structure it, and get it to work. From scratch. By myself.
I started learning python using Automate the Boring Stuff, a website I found so useful I bought the book to say thank you to the Author. I’ve learnt all the basics from section one and started coming up with exercises myself to practice with, which is how SelenaBot came about.
I’ve had an idea for a Science Engagement stunt/thing (which, if it works, I’ll devote future blog posts to as it develops) that requires access to lots of Instagram posts, preferably by celebrities. So I’ve put together a piece of code that automatically accesses celebrity Instagrams and downloads their most recent 12 pictures, and I’m going to talk us through it here.
I suspect that this gross mess is because using BeautifulSoup is not the most intelligent way to scrape instagram, but I’m new here so that’s what we’re using.
PageJS, and specifically PageJS– the bit of code that contains the information that allows us to access the pictures – happens to be the ugliest thing you’ve ever damn well seen, and line 14 of SelenaBot took my the best part of an evening to write
Here’s the full content of PageJS
and here’s line 14 again
After stripping it of the begining and end characters to render it into something that the JSON module can interpret, we can see that all the information we need: Image URLS, timestamps, and more are in a dictionary with the key “Nodes”. To get to “Nodes” however have to pick our way through another dictionary nested in a third dictionary, nested in a fourth, nested in a list, which is nested in two further dictionaries. This took me a long time painstakingly going through a text file I made of JS and I really rather suspect there was a tool that would’ve helped me a lot quicker. But whatever, we can access the dictionary and we’ve called it allPics. From here on out it’s plain sailing, the for loop iterates through the allPics dictionary, and saves each one to the harddrive, giving it the file name of User + the unix time stamp.
The final two lines of code tell SelenaBot to open up a list of the top100 most followed celebrity accounts and then download all their most recent 12 images, and off it goes
Just briefly, the top100gram module is another smaller scraper I wrote that pulls a list of the Top 100 most followed instagram accounts off a not particularly reliable looking website called SocialBlade. Socialblade hasn’t been updated since Justin Bieber quit and then rejoined instagram, which mean the list was passing an incorrect username into Selenabot. The poetic irony of having a piece of code named after Selena Gomez react violently and stop working at the mention of Justin Bieber’s name was not lost on me, and also forced me to write my favourite three lines of code yet.
if 'justinbieber' in top100: top100.remove('justinbieber') top100.append('justnbieber')
It’s not too late to say sorry.