Watching New Bike Inventory
Scenario: I want to buy another bike. But I don't really need one, and I don't want to spend the money for the bike just yet (because then I won’t be able to buy some other toy I don’t need). But there's a specific model in my size on sale, and the quantity is low. I'd like to be able to keep an eye on the inventory of that model without having to click through the website every day.
make sure pip is installed (sudo apt install pip)
make sure beautifulsoup is installed (sudo pip install bs4)
make sure smtplib and re are installed (I’m assuming they are by default).
scrape the web page for the data
look for the quantity in stock
email the quantity value to myself
schedule the script to run daily
At first I tried using my normal tools, which is a combination of the python request module, and beautifulsoup to parse the html. However, I was having difficulty because the data I was looking for wasn't in the webpage when I did the HTTP get request. It's weird, but there's some information you only get from visiting the web page, and when you do an HTTP get on the webpage, some of that data is missing. I think it's because the data may be in a PHP script, or some other backend script that the client has no access to.
sudo pip install selenium
from selenium import webdriver
driver = webdriver.Chrome(executable_path='/path/to/chromedriver/')
driver.get('https://website.to.scrape.from/')
and then somewhere before the end of the end of the script:
One quick thing to note: When using Selenium with a web browser as I am in this case, and specifically with Chrome, I need to download the chrome driver for Selenium to function properly. This can be found at:
http://chromedriver.chromium.org/downloads
Once downloaded, it needs to be unzipped, and then the path of the executable needs to be in the line that says “webdriver.Chrome(executable_path=”
Selenium is different from requests, in that it doesn't just perform an HTTP get. It actually opens a browser and then gets all of the data from the loaded webpage. From there, I could then use beautifulsoup to parse for my html tag that had the inventory count.
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
thread = soup.html.body.findAll('div',{'class':'items_left'})
for i in thread:
print(i)
And then to email the content to myself:
First make sure that the variable i is converted to a string:
Next make sure you have imported the smtplib module:
And then do the following:
msg = 'this is how many bikes left: '+count
subject = "bike inventory"
message = 'Subject: %s\n\n%s' % (subject,msg)
server = smtplib.SMTP('smtp.gmail.com:587')
server.starttls()
server.login(username,password)
server.sendmail(fromaddr, toaddr, message)
server.quit()
Good, now almost every piece is done. At this point you can test your script to make sure it actually gets the content from the webpage and emails it you. I tested it for myself and it worked.
Lastly, I need to schedule this script so that it runs daily, and we do that with Cron. First off, we need to make the that at the top of our python script we have:
Then, normally the Cron syntax looks like:
00 14 * * * /usr/bin/python /path/to/my/script.py
But this time, because Selenium actually opens a web browser, We need to do it a little differently:
00 14 * * * export DISPLAY=:0; /usr/bin/python /home/rpartlan/Documents/python/selenium-t130.py
We need to say “export DISPLAY=:0 so that the Cron job runs in an environment with a GUI, instead of just command line.
And that’s it! So now I’ll be ready when there’s only 1 of the bikes left in stock. By then hopefully I either forgot about the bike (and thus won’t want to buy it), or I’ve already spent the money on something else (and can’t afford to buy it).
This is definitely a strange use case. It’s sort of like the opposite of the ticket bots buying tickets as soon as they go on sale. But this could help for when you want to buy something down the road, and just want to know if it’s still in stock, so you can react before they run out.
And finally, here’s the github gist, in case you’d like to see the whole script all together.
https://gist.github.com/partlan/debb86d4e1df1fea8baef1c5a1b9f54e