Well on Christmas eve, the day after I wrote & blogged about SelenaBot, I tried to run it again and got a bunch of “connection error” exceptions. Having run this by some friends, we suspected that Facebook servers detected Instagram was getting scraped and then got angry about it and started blocking me.
Changing my IP address – by coming back to my parents’ house for Christmas – seems to have worked: It runs again smoothly, although much more slowly as 1Gbps internet isn’t a thing in the countryside. I’m hoping that a more subtle scraper that I’m working on, which will only download images taken in the previous 24 hours, will be much kinder on the Instagram server and not get blocked. Otherwise I might have to teach my code how to change its IP address midway through operation…
Well things have gone very successfully, and SelenaBot will now only download files that were taken on the calendar day that SelenaBot is run. I need to alter this so it is actually the preceeding 24hrs, but I’m not sure how to do that yet. This means that Selenabot only scrapes around 150 pictures as opposed to 1,200 pictures, and the requests come in at more random time intervals (as opposed to one after the other) and so far this hasn’t resulted in any blocks. However all my scraping at this IP address has been done within one 24hr period so it will be interesting to see if I suddenly start getting server no reply exceptions tomorrow (i.e: does facebook analyse all its traffic once every 24hrs and then block offending requesters from the next day’s traffic?).
Another interesting thing that happened between this morning and this afternoon is that the account “repostapp” suddenly disappeared in the middle of the day (as in, it worked this morning, stopped working this afternoon) which meant that it was derailing the whole of SelenaBot, in much the same way Justin Bieber did the first time I ran the code. To stop this from happening in the future as and when other accounts get deleted/taken down, I’ve taught myself Exception Handling, so now when an account disappears, the code can carry on running.