Advanced Twitter Streaming API Concepts in PHP
Twitter's Streaming API Documentation is pretty slim for advanced uses. Sure they give you some of the bare the concepts and examples but when it comes time to actually build yourself a scalable application you can find yourself stuck with an inefficient product. I am not going to give you code examples but more discuss certain concepts that will help you succeed.
First of all, these concepts can differ per application and may not apply directly to yours. From my experience, I have discovered three concepts when building an effective application using Twitter's Streaming API.
First note that you don't want the Streaming API being run directly in your Web Application. Cron Jobs work great for scheduled tasks but not for handing a constant stream. In my case I ended up using System_Daemons. They work pretty well and fairly easy to set up in a UNIX environment.
Get out of being Rate Limited:
Option 1: Apply and become a Twitter Partner.
Option 2: If you are wondering how to track more than 400 keywords without becoming a twitter partner. Use your application to authorize more twitter users and use those auth tokens for each 400 or (399 to be safe) keywords being tracked.
This concept is directly related to speed and redundancy. First of all if you are tracking a lot of keywords, users, or whatever you may be doing. Twitter is going to be sending those to your application in great abundance and rapidly. In one of my applications we were getting in 1000 tweets a second. Now the key is to handle that amount of data is not to process every tweet right when you get them in. You want to save the tweets coming in as fast as possible and insure that they will be processed and that involves just doing one thing, saving them into a database or file.
In a previous app I was building we were using MongoDB and that worked great. Basically just converted the tweet coming in, into an array and then inserting the array directly into the MongoDB. This was fast and worked great.
Once the tweets are queued you want to processes them. To really benefit from Twitter's Streaming API, you want to process the tweets in the queue as fast as you can. If you can keep the queue count at zero, then you are golden.
Using a queue helps you prevent your application from losing tweets and creates an efficient way for processing tweets.
Planning is key. Think about how you are going to integrate Twitter's Streaming API. Write clean and efficient code. Speed and redundancy is key.
Last but definitely not least, use Phirehose for ease of use.
If you have any questions or need any help on a project send me a message, I am available for hire.