Distributed Quota Management in an Ad Serving Environment: How Reduce Data Manages its budgets without overspending.

pixel skylines
PUT YOUR BEARD IN MY MOUTH
"I'm Dorothy Gale from Kansas"
styofa doing anything
RMH
todays bird
Monterey Bay Aquarium
$LAYYYTER

★
d e v o n
Keni

blake kathryn
Sweet Seals For You, Always
almost home

titsay
Aqua Utopia|海の底で記憶を紡ぐ
No title available

roma★

No title available
ojovivo

seen from United States
seen from United States
seen from United States
seen from United States
seen from United States
seen from United States
seen from United States
seen from United States

seen from United States
seen from United States
seen from United States
seen from Mexico
seen from United States

seen from Mexico

seen from Switzerland
seen from United States
seen from United States
seen from India
seen from India
seen from United States
@azifali
Distributed Quota Management in an Ad Serving Environment: How Reduce Data Manages its budgets without overspending.
Google's Open Bidder is Everything But Open
A few months ago (May 2013 to be exact), Google announced that they are going to rollout an Open Bidder using RTB. Real-Time Bidding (RTB) is the protocol used by ad exchanges to buy/sell ads.
According to the Blog announcement:
"Thursday at Google I/O 2013, we announced Open Bidder, a fully customizable toolkit for building real-time bidding applications. By combining the speed and flexibility of the Google Cloud Platformwith DoubleClick’s expertise in real-time bidding, Open Bidder provides developers a robust platform to quickly create innovative RTB solutions."
I was excited - Doubleclick is a large RTB exchange and that was reason enough to trust them to build the best bidder in the market. However, If this was an open source project then why isn't it directly published it as one.
I was asking myself - as to why should I be filling up a form?. OK maybe Google wants run a closed test..
Anyway, I instantly applied to the Google Open Bidder and got this reply:
It has been 5 months since then and I have got no feedback OR update from Google Open Bidder Team.
Turns out that Open Bidder was everything but open and this project was not ready for primetime. Google's launch announcement had every self respecting advertising magazine and blog writing about it. However, being a part of an advertising platform (http://reducedata.com) hat is writing custom bidders to connect to various exchanges, I was one of their key target audiences and I am wondering why they did not bother to get back to me.
Google - I am sure that this is a real product but there is no reason to call it "Open bidder" when infact it has been completely closed down and still continues to remain closed. Also, a piece of advice, please don't announce projects years in advance or when infact you don't plan to actually release them anytime soon...
Asif Ali
Advice To Large Tech Startups
I've never actually worked in a very large company but I've seen a few wrong things happen way too often in companies that I've known or heard from friends in the Industry.
Based on what I've learned and heard, I've compiled a list of things not to do.
1. Hire people with no sales experience into weird long titles that have no meaning or sense to them - Head of Global Sales, Strategy - Seriously?
2. Hire a failed product manager from one company and assume that he/she will be successful in your company. Failed executives from somewhere else do not automatically succeed because you have a lot of money to burn. You can give them a chance but not into significantly bigger responsibility.
3. Hiring someone with no product experience, technology experience to lead a very high technology business unit = EPIC FAIL. It might have worked elsewhere.
4. Don't pay lip service to technology. Either you're a technology organization or you're not.
5. Honour your agreements and commitments to your partners.
6. Please don't acquire to screw. Small acquisitions cannot be based on retention + revenue goals + everything else under the sun.
7. Lastly, don't be a jerk.
Reduce Data Demand Side Platform Launch
I am happy to announce the Launch of Reduce Data Demand Side Platform [1] (DSP), an advertising platform that leverages big data technologies to deliver significantly optimized ad spends.
We believe that media buying is still inefficient and that there is a lot of room for optimization. Advertising fraud is more common than you think and this wastes billions of dollars of ad spending each year.
Reduce Data is built from ground up using large scale data processing technologies (commonly known as Big Data technologies) to cut media waste, track campaign performance and optimize it at scale.
Advertisers can reach users across desktop, tablets and mobile devices with Reduce Data. We're rolling out the platform with a variety of new features and are extremely excited about the path ahead.
Please do watch our blog or our Facebook page for announcements.
--
1. A Demand Side Platform is an advertiser focussed, programmatic platform that uses Real-time Bidding as a mechanism to buy display ads through auctions.
You know you're in Silicon Valley when..
The guy on the street is talking about incubators and accelerators.
You bump into Dave McClure at your favorite coffee shop.
The car sales man sold software for 20 years. People are talking about open compute in the public bathrooms.
You see the same faces at Starbucks at the same spots everyday, using it like an office space. The Google Product manager you spoke to a couple of days ago lives right next door.
You meet more Founders than Employees.
Your car breaks down with a software bug.
You drive into an "Infinite Loop".
Dear Google, Facebook is moving fast to build a great partner ecosystem. Please respond (to me) and the market faster.
Dear Partner Manager at Google / DoubleClick - We've been trying to get to you for the last one month. This is to get our platform Reduce Data Certified. In our first response I got a form within an email without a submit button.
In my second email a week later, I was asked to type down the responses line by line which I did and have not got any response.
I have tried to reach out to folks whom I know at Google but looks like it will take some time before I can get that channel to work for me.
In the contrary, last year before the Holidays, I got in touch with Facebook and immediately got a response from many senior folks in FB Partner Management. They were eager to sign on new advertising partners and even got into calls the following week and replied even through the weekend with real names at the end of the emails and contact info in case I wanted to get in touch with them.
This really brings to me one key question: Is Google too big to respond to partners, customers and the marketplace in general. I wish it were not true. But it seems to me like it has already become too big to be nimble.
In any case, a request: I am still waiting for response from an unnamed person in the Google Doubleclick certification team. We are losing valuable business because my company, Reduce Data is not yet certified.
Hundreds of Companies have moved from India to Silicon Valley in the last couple of months
Almost everyday, I bump into Entrepreneurs from India in various parts of Silicon Valley. Mountain View, Sunnyvale, Palo Alto, San Francisco, Menlo Park ...they're everywhere.
Surprisingly, many of them are 'fresh off the boat'. That includes me as well, who came in late July 2012 and got a long term work visa a couple of months later.
In my discussions with Mukund Mohan (EIR, Microsoft Accelerator), it seems that a little over 300 companies have moved just in the last couple of months alone.
Many are gaining good getting traction, getting funded and hiring a lot of employees (both in India and in the US).
There are a couple of things that is driving this
a) Cities like Bangalore have become very expensive.
b) It is probably easier to hire and retain in Silicon Valley than in Bangalore (and sometimes in Chennai).
c) Most software product companies are built for global markets; and there is no reason to build it out of Bangalore or Chennai when you can do it in Silicon Valley
d) Silicon Valley is probably the best place to be in the world to build a technology company (You have to be here to believe it); and the lure of Silicon Valley culture is very strong - It is probably the only place in the world that has a high concentration of technologists, entrepreneurs and venture capitalists. All of it being in one place helps!
e) VP Level Jobs are at $100k in Bengeluru and Mumbai.
f) The infrastructure issues that you face (traffic, commute times, power cuts etc) are a serious impediment to building any kind of business.
Having said that these companies including mine aren't really shutting down India offices and moving everything to the valley. We are maintaining an engineering and sometimes additionally an operations team back in India.
But the centre of gravity of these companies is definitely moving west to Silicon Valley and while this is really good for the companies, it is definitely not good news for India in general.
GoDaddy deletes my domains and charges me to restore them
I logged in to Godaddy.com to change a domain and guess what?. All the domains were missing.
I call customer service, they say I deleted the domains but wouldn't confirm the IP from which it was deleted
"Godaddy is a large business and we track millions of customers across the world, so we don't possibly store the IP or region or computer name from which the domain names were deleted "
Another one says,"I didn't say we don't save this data, it is just that I don't have access to it".
Okay - I keep saying, I didn't delete it. I deleted the debit cards that were on file so its possible that Godaddy possibly deleted all my domains along with my debit cards but no - the agent on phone doesn't accept that's what actually happened.
"We will restore all of your domains except one domain which was about to expire - we have already released that and it costs $80 to restore it".
Me - "So you're holding my property Illegally and asking me to pay money to release it"
Agent - "Sir that's the charge the registrar charges ...we can come halfway and discount $40"
Me - "Why did you release a domain that belonged to me..the registration was still active. And two days before the domain expired, I renewed the .co domain at $30 for a year".
The agent, "Since the domain was close to expiry so we released it".
Me - "So you released my private property back to the registrar without reconfirming even though to me at that at that moment".
Agent - "Yes - You cancelled it"
Me - "But the other domains are still intact"
Agent - "Yes but the last domain was nearing expiry, so we released it"
Me- "But I just renewed that like 2 days ago, which means it should not have been released"
Agent - "Doesn't matter. You cancelled it again".
After nearly 30 mins of argument, the agent doesn't relent. He offers to give a discount but will effectively charge me $30 + $11 for the domain including registration.
I remember that I requested them to "unlock" that domain so that I could transfer it to Namecheap.com. I never got to transferring it but Godaddy either has a serious issue with their system or have found a way to make a quick buck out of customers abandoning ship.
To think of it, they might have done more harm by shutting down my key domains by simply releasing (my private property) back to the registrar.
I cannot believe that this day and age, someone could be so cheap with their own long term customers. I've learned a valuable business today and that is to never trust a company like Godaddy ever again.
Re-targeting Is Broken
How many times has this happened to you - You search and buy a product. Then you're constantly hounded again and again for the next several days with ads of the same product appearing all over. Chances are that it has happened more than once.
Would you buy the same vacuum cleaner twice? Would you want to buy that same book from Amazon or those amazing pair of shoes from Zappos again? I don’t think so, but today many of these ads tend to appear again even after a purchase.
Re-targeting is the method of displaying ads again to users who have seen something of interest but did not complete the transaction.
Re-targeting is useful, but displaying the same ad even after I (or any user) have already bought the product is a waste of advertiser spends and a great way to annoy any user.
Why does this happen? Many re-targeting solutions don’t really exchange necessary data that tells the Ad Networks if the user has purchased the item or not.
And without this, what happens is that you get a constant barrage of ads of the same item displayed on every other site.
This has been happening for a while now [http://www.adexchanger.com/data-driven-thinking/personalized-retargeting-overkill/] and therefore not really a new issue.
To give you an example, I did an actual transaction yesterday evening (1/22/2013) by buying a Vonage line.
I saw the service yesterday morning, clicked through but dropped out at the payment step.
In the evening, I saw Vonage targeting me with Google ads. I clicked on a re-targeted ad which had an offer on it.
Next, I bought the product in a few minutes and received confirmation email [see slide 1 at the bottom of this article]
Even 10 minutes after the purchase, I saw the ad again on the same website. [see slide 2]
I really don’t want two phone connections today. Please forgive me for even buying the first one :-).
This is a classic case of an over use of re-targeting and probably just an inefficient tool wasting advertiser spends. What’s troubling is that this has happened on a prominent ad network [Google] and not some small random player.
So how does one fix this?
The solution to this problem is a little difficult but doable. Advertisers can exchange data with ad networks (using cross browser cookies or other mechanisms) to avoid such re-targeting issues.
A simpler way would be to use a tool / network that has solutions to such problems.
Another option (which also happens to be a shameless plug ;-))
It is always also recommended that an advertiser use an independent audit system such as Reduce Data to verify ad spends.
Beyond that, Reduce Data can help identify steps in the campaign funnel causing media waste and give specific recommendations that can help optimize ad spends.
If you would like more information about Reduce Data, please head to our website at http://www.reducedata.com or our blog at http://blog.reducedata.com.
Google should put a SOPA like blackout to Free's French consumers
The majority of Internet services are supported by advertising. I think it is is OK if a few customers decide to take steps to block ads on their browsers. But when an ISP does that, it creates a ridiculous situation (Fast Company: http://www.fastcompany.com/3004452/french-isp-free-blocks-all-web-advertising).
I think Google and other large players should limit access to free services (Gmail, Search etc) by putting up a SOPA like blackout for at least for 1 day. This blockade should not be seen as a use of force or coercion but rather a gentle reminder to users saying that ad revenue is important to Internet based free services. And that they should request their ISP to not to unilaterally block advertising. I also suggest that the services be limited only briefly and that the user can go past and continue to use the product after seeing the message.
I believe that an action like this will raise awareness of ad supported services and hopefully French consumers will force French ISP to reconsider its decision.
The Case For Bullshit Metrics
This blog entry is written in response to a blog by Suhail Doshi of Mixpanel titled Bullshit Metrics (http://sufficientlyadvanced.net/bullshit-metrics)
Suhail Doshi of Mixpanel calls user signups, page views and other similar metrics as Bullshit metrics saying that these metrics don’t really correlate to the success of the startups.
Maybe Suhail forgot that only in May, Mixpanel itself had touted that it measured 7 billion actions (http://gigaom.com/2012/05/10/mixpanel-raises-10m-in-bid-to-dominate-data-geekery/). What does that mean anyway? Isn’t that bullshit metric as well?
Metrics such as user signups and page views are important. Getting enough users to sign up does matter. Retention, active users and other metrics that can only be measured after the signups have occurred in the first place.
Page views cannot be wished away when most Internet businesses are still dependent on advertising as a source of revenue:
The blog talks about Tumblr’s 20B impressions each month as one example of a bullshit metric. Now if we all agree that revenue is an important mechanism to identify a key metric then I guess we all agree that advertising revenue which is driven by ad impressions is a key metric. Ad impressions are directly proportional to page views. This means that page views is a metric that cannot be wished away. While for many, this may not be the single most important metric but it is an important metric nevertheless.
New metrics of measuring engagement especially in media such as engagement in the form of likes, or re-tweets are also important but again, similarly they aren’t necessarily the only metric that can be correlated to success.
Many Internet businesses depend on advertising revenue (including social media giants like Facebook and Twitter) and what matters to them are Page views which leads to ad impressions, clicks, conversions, cost per conversion, brand lift, etc.
The survey results below clearly highlight the industry standard:
Metrics used by Brand Marketers in North America to determine effectiveness of online ads:
Active users are an important metric for Facebook. But Page Views are important in order to justify ad impressions: Page views are also measured even through photo flips because this helps Facebook justify the various number of impressions of ads it serves up to each user. As the pages are flipped, the ads change because FB follows the standard practice that was set by various publishers for a long time – display different ads on different pages and earn more.
Facebook’s charges are impressions based or sometimes click based. (Cost Per Milli - CPM - which represents 1000 ad impressions or on Cost Per Click - CPC basis). Page views are inherently, a very important metric.
Facebook ad screen showing how the ads are sold:
What is changing is that advertisers have been saying that things like Click Through Rate (CTR) or eCPM don’t matter but none of that has been leading to any kind of One Key Metric. It all is leading to several different metrics depending upon the mind of campaign run but again they are all linked to Page Views.
Engagement is an important metric like for example videos played. I do not disagree. But may not the only key metric. The survey below clearly shows that no single metric is important to the ad agencies / advertisers who are the primary source of revenue to someone like (Google) Youtube.
Conclusion:
I don’t think everyone is going to start dropping existing metrics and rush to find that single key metric that they need to focus on. There are a lot of people who do not agree with this and are already labeling it a fad. I don’t have an opinion on One Key Metric.
But calling key metrics that actually are directly related to the business’s success as “bullshit metrics” is being stupid or just being ignorant.
Microsoft Research - Demos of some awesome technologies...must see!
Performance Tuning of Web Apps
There are thousands of blogs and books on performance optimization. Yet, I thought it might be appropriate to put in some of my learning into a blog. So here it goes, a starter guide to performance tuning of web apps.
Simple Page Specific Optimizations
There are a lot of simple optimizations that can be done at the page level. The simple optimizations are
Load fewer resources (such as images, css, js) or put all of them into a single js or cs file.
Optimize image sizes, if possible use sprites.
Minify JS..
Load only the JS or CSS necessary
Use tools like YSLOW to test page performance and optimize
Cache all static resources (more on this later).
Don't serve static assets through your application server.
Do not use Rails, Play or any other app sever to serve static assets. Not even Apache. Instead use Nginx or other reverse proxies which can handle this with ease, cache them (on client browsers) and Gzip them whenever possible.
Nginx code to serve static assets given below:
location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|js|mov|html)$ {
autoindex on;
root /home/yourpath;
expires 30d;
//If the file is not available, route the request to your app server configuration
if (!-f $request_filename) {
proxy_pass http://yourdynamicserver;
}
break;
Compress your files using Gzip in your response.
Often, reverse proxies like nginx offer simple GZip compression where data is sent in a compressed format. Most modern browsers support this and it can be enabled using simple configuration, example for Nginx is given below
Gzip compression
# output compression saves bandwidth
gzip on;
gzip_min_length 1000;
gzip_proxied expired no-cache no-store private auth;
gzip_types text/plain; #application/xml text/html text/css text/javascript;
HTTP Limitation 2-6 connections per domain:
HTTP 1.1 spec limits to about 2 connections per domain which means simultaneous requests are handled in a FIFO queue per domain. One way to tune this (disclaimer: I have not specifically tried this) is to have different requests coming from different domain within your page.
http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections/
Dynamic apps are mostly connected to databases. Or are they?
Most people who start building simple applications start building dynamic database connected applications. My suggestion: don't build Dynamic Pages unless you have to: Let’s consider a few use cases. You have a news website. Guess how often news needs to be updated?. Not very often. If so, your app should be a CMS that outputs HTML files that are served statically. These caches could be used or even other kinds of caching could be used to speed up your web app.
Caching
A cache is typically a temporary data storage area on a disk or in Memory RAM.
Caches can be used to speed up web applications.
A brief Overview of Memcached:
Memcached is a distributed in memory store for storing and retrieving data really fast. Read more about memcached here: http://www.slideshare.net/azifali/memcached-presentation-5729628
HTTP Caching also sometimes referred as page is the process of putting a cache in front of your application server and caching pieces of your application. There are caches like Varnish, Squid available. For a detailed article on HTTP caching, visit the article: http://blog.octo.com/en/http-caching-with-nginx-and-memcached/. I will however talk about some form of page caching below using file system and using memcached.
The key point to consider in building scalable web apps is to minimize computing power used per request.
Caching And Serving Files using the File System:
The easiest thing to do in situations where the content is the same for all users is to cache the entire page and deliver it via the file system.
Caching And Serving Files using the Nginx and Memcached:
You can also use the above pattern to store data into memcached and force nginx to directly connect to memcached and serve the files. The result will be much faster than fetching files from a file system. Remember, the ‘HTML Generator’ is a component that you will need to write. This component will fetch results and store it into Memcached with the file name as a key. We can then modify the nginx configuration to look into memcached first if a request comes in. If the file is not available into memcached, then the request could be passed to the file system or the underlying application server.
location ~* \.(html)$ { access_log off; expires max; add_header Last-Modified "Thu, 26 Mar 2000 17:35:45 GMT"; set $memcached_key $uri; memcached_pass 127.0.0.1:11211; error_page 404 = /fetch; } location /fetch { internal; access_log off; expires max; add_header Last-Modified "Thu, 26 Mar 2000 17:35:45 GMT"; proxy_pass http://backend; break; }
Caching Data into Memory
Assuming that there are scenarios where the entire page cannot be cached for all users. In such scenarios, it is better to apply caching to cache a specific data set, a specific part of the page, a recomputed result and so forth. This cache could be specific for each user or a common piece that could be used for all users.
A cache is typically an in-memory system such as memcached.
Example: Loading some dataset into memory using nodeJS code:
var client = new memcache.Client(11211, '127.0.0.1');
client.connect();
connection.query("SELECT business_name,id from business", function (error, rows, fields) {
client.set(rows[i].id,rows[i].business_name ) function(error, result){
}
----
A partial segment could be a table that is repeatedly shown to the users. This could be generated and stored into memcached for use anytime later..
Example: Creating a list of links and storing it into memcached using Javascript / NodeJS:
var news;
for (i=0;i<=1;i++)
{
news=news+items[i]+”</br>”;
}
memcachedvariable.set(“news_links”,news); //set news links into memcached.
----
Database Performance Tuning
For a detailed guide on MySQL performance mistakes, please read: http://www.slideshare.net/techdude/how-to-kill-mysql-performance.
Note: Some tips from this slideshow have been compiled into these notes.
Databases are generally harder to tune and the simple reason is because databases can be tuned differently for different use cases. For example a high volume read database tends needs different optimization than a high write database.
There are no exact sets of things to do in tuning a database, but I will try to list some of the most basic things that you would do in order to ensure that the database is up and running well.
Ensure that you don't have any joins while fetching data in a web app. If you are looking up data from a table or various tables, ensure that the data is available in a single data structure (de-normalize if necessary). When running joins, ensure that there is enough Join buffers and other query buffer space available (more on this in the variables section).
Fetch Limited Datasets: Don't write queries that look like this, " select * from entitites" and try to reduce the amount of items displayed in your code. You're still forcing the database to return a full dataset. Write queries which have some limits on resultsets fetched.."Select entity_name, description from entities limit 10 order by id asc" is more better.
Use the right storage engine. For most parts MyIsam works but there are different storage engines available for different purposes. Use what is right for your application use case. If your application needs high read performance, MyIsam engine would be suitable. On the contrary if you’re looking for high write performance, then Innodb is recommended. If you're using MySQL MyIsam as the engine for storing data and have the key cached enabled, watch out for frequent updates as this invalidates the Key cache which would make reads significantly slower. Also watch out for write lock contention.
For Both Innodb and MyIsam (both which are commonly used engines) , ensure that your buffer variables have enough memory allocated. The default configuration that ships with MySQL won't do and many people make the mistake of assuming that this is well taken care of. And memory does not refer to a single variable but a range of variables that can be configured at setup (Server Variables) and those which can be monitored at run time (Status Variables).
Use Explain statement to understand and optimize your queries. Generally using an ORM layer abstracts queries against the database, if so, look out for the slow query log with the list of queries that are slowing down your app.
Have enough indexes on the right columns: Having too few or too many indexes are both a problem and can lead to performance issues.
Ensure that your database data-types are of the correct datatype. Ensure that the size is absolutely minimal: This goes a long way in helping optimize the amount of storage required and the speed of queries. For example, do you need to have a char[255] when you could store it in char[2]?.
Database Variables
Please note: Most of these notes are from MySQL Help with some explanation where necessary
innodb_flush_log_at_trx_commit:
et this to 0 if you have a high write environment else set it to one. By setting it to 0 you're asking innodb to write values to the innodb log and then to the file once a second instead of it happening on a per transaction basis.
innodb_additional_mem_pool_size:
The size in bytes of a memory pool InnoDB uses to store data dictionary information and other internal data structures. The more tables you have in your application, the more memory you need to allocate here. If InnoDB runs out of memory in this pool, it starts to allocate memory from the operating system and writes warning messages to the MySQL error log. The default value is 1MB.
innodb_buffer_pool_size:
The size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. The default value is 8MB. The larger you set this value, the less disk I/O is needed to access data in tables.
innodb_commit_concurrency:
The number of threads that can commit at the same time. A value of 0 (the default) permits any number of transactions to commit simultaneously
innodb_file_per_table:
Enable a seperate file per table using this variable.
innodb_lock_wait_timeout:
The timeout in seconds an InnoDB transaction may wait for a row lock before giving up. The default value is 50 seconds. A transaction that tries to access a row that is locked by another InnoDB transaction will hang for at most this many seconds before issuing the following error:
transaction-isolation: InnoDB supports each of the transaction isolation levels described here using different locking strategies. You can enforce a high degree of consistency with the default REPEATABLE READ level, for operations on crucial data where ACID compliance is important. Otherwise READ COMMITTED works for most use cases.
For locking reads (SELECT with FOR UPDATE or LOCK IN SHARE MODE), InnoDB locks only index records, not the gaps before them, and thus permits the free insertion of new records next to locked records
innodb_log_file_size:
The size in bytes of each log file in a log group. The default value is 5MB. The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery is slower in case of a crash.
Most importantly ensure that queries are well tested and enough indexes are available using the "Explain" command.
innodb_thread_concurrency:
InnoDB tries to keep the number of operating system threads concurrently inside InnoDB less than or equal to the limit given by this variable. Once the number of threads reaches this limit, additional threads are placed into a wait state within a FIFO queue for execution. Threads waiting for locks are not counted in the number of concurrently executing threads.
innodb_flush_method:
This variable decides how the innodb data is written to the disk. Recommend O_DIRECT except in the case of SAN based storage.
innodb_lock_wait_timeout:
Default value is 50 seconds. If you want your app to respond faster in case of write locks, set this value lower. Note, that this means that data consistency issues will occur.
Key status variables to watch out for:
There are a number of variables that one needs to watch out while running your MySQL Databases. These are called STATUS variables. Here are a few important STATUS variables to watch out for:
Created_tmp_disk_tables:
The number of internal on-disk temporary tables created by the server while executing statements.
If an internal temporary table is created initially as an in-memory table but becomes too large, MySQL automatically converts it to an on-disk table. The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values. If Created_tmp_disk_tables is large, you may want to increase the tmp_table_size or max_heap_table_size values. value to lessen the likelihood that internal temporary tables in memory will be converted to on-disk tables.
Handler_read_rnd:
The number of requests to read a row based on a fixed position. This value is high if you are doing a lot of queries that require sorting of the result. You probably have a lot of queries that require MySQL to scan entire tables or you have joins that do not use keys properly.
Handler_read_rnd_next:
The number of requests to read the next row in the data file. This value is high if you are doing a lot of table scans. Generally this suggests that your tables are not properly indexed or that your queries are not written to take advantage of the indexes you have.
Innodb_buffer_pool_read_ahead_rnd:
The number of “random” read-aheads initiated by InnoDB. This happens when a query scans a large portion of a table but in random order.
Innodb_row_lock_time:
The total time spent in acquiring row locks, in milliseconds.
Innodb_row_lock_time_avg:
The average time to acquire a row lock, in milliseconds. If this value is high it means your queries are waiting and database needs optimization.
Qcache_hits:
The number of query cache hits. If this is high, it is actually good…if you’re reading same data frequently and if this value is low, then check if the query cache is enabled or if the queries are getting written to the cache.
Qcache_inserts:
The number of queries added to the query cache. If this value is high and increasing frequently, cache invalidation is high. If so, try to optimize your queries to not to write to the query cache by setting query_cache=0 in the MySQL Configuration file which is loaded at startup or to 2 to enable query cache only for queries that begin with that begin with SELECT SQL_CACHE.
Asynchronous code
Write asynchronous, non-blocking code wherever possible: As with all apps, a user is typically held up when an IO happens and people don't realize how many place IO blocks can happen.
Lastly
I don’t think this is a complete list of how web page optimization is done and I believe that there are more modern techniques in using asynchronous approaches to write high performing web apps.
However, I hope that this list has been a good starter guide to a small startup to help them get started with their optimization off the ground. If you have comments or questions, please write to me on twitter a @azifali or send me an email to asif.ali [at] outlook.com.
Facebook's analytics using Hbase
Other reference links:
http://www.slideshare.net/larsgeorge/realtime-analytics-with-hadoop-and-hbase
http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html
Title Inflation
It is really funny when start-ups OR even companies which have moved beyond the initial stages, start giving fancy titles to their employees.
Consider these titles, "Head of Strategy", "Global Director and Vice President of Business Development", "Vice President, Strategy and Sales - Asia and the World" and more.
Titles like these are for companies who have no clue as to what job description needs to be given. Start-ups need to be clear as to what is the job description that they're giving to folks. And to that effect, it should have a simple clear title indicating position and role reflecting the hierarchial or flat structure, whatever you choose to follow.
Confusing titles signify that start-ups are adding people to the payroll without knowing what their jobs are going to be. If that is the case then these folks should not be hired in the first place.
CNET has got it wrong about Facebook Advertising
I came across this article titled, "How Zuckerberg's wedding reveals Facebook's problem" written by Chris Matyszczyk today. Though this makes for a catchy title, I think that CNET has got it completely wrong about Facebook advertising. It accurately represents how much or rather how little Chris and CNET both understand advertising networks and how it works. CNET has projected that this is a long term problem and users might very soon have to even pay to use Facebook.
Let me explain why: Ad networks typically deliver ads on targeting and relevance. Targeting refers to learning about the user (location, device, carrier, etc) and serving an ad that matches those parameters.
Relevance represents the understanding of the user via user profiling, social graph etc and serving and and that best is the most relevant to the user.
Now, while these are the most desired ways to serve ads, there are several factors that represent ad serving. Now it is not a simple topic but I will try my best to simplify it.
To make targeting and relevance work well - both supply and demand must match. With Facebook, there is no doubt a huge supply (of users and inventory) but there is a dearth of ads. This explains why their revenue was 'only' $1B given even though their reach was more than 750m users in 2011.
Another important factor is that various campaigns sometimes 'bid' at real-time for such inventory and sometimes less relevant or less targeted campaigns might end up winning the bid. So, while the ad (about the heart attack) that Chris saw was not entirely relevant but was correctly targeted to his age group.
Another factor to consider is the Run Off Network (RON) campaigns where the advertisers buy large volumes of inventory at guaranteed flat (low) pricing just to provide maximum exposure to as many as users as possible.
The primary problem that Facebook faces today is that the market for social advertising is really early. Demand and supply needs to be evenly matched for advertising to work well.
Facebook might not serve the most relevant and the most targeted ad today because both the advertising demand and its ad serving technology is in its infancy. But they have built the best social networking platform ever.
Similarly, I do not think that in its early days Google managed to deliver the most relevant ads. What they did well is that they managed to serve the most relevant search result and that was what important in its early days.
Conclusion: Facebook advertising is in its infancy and this is really a short term problem which will be addressed as the marketplace expands.
Hiring offshore in Spain
For those who don't know - we (Komli Mobile / ZestADZ) have a small presence in Spain. Our lead engineer (he refuses to call himself the CTO), sits in Spain while managing the entire engineering team in Chennai.
I know that it might sound crazy but its an experiment that we've run successfully and we've rolled out about 4 major projects with this structure.
Startups don't normally hire outside their own offices, let alone outside the cities or countries. I am not going to try to convince you about offshore hiring for Indian startups. That is best left to another blog. Let me get back to our experience and talk about why I think Spain is a good choice if you're planning to hire offshore.
The current economy in Spain has left about 22-25% jobless rate. Spain's salaries grew well in the boom years and topped European charts. But because of dearth of jobs, it is easy to find great talent - people with interesting companies, projects and with pan European experience.
In addition - the time difference between Spain and India is not as wide as India and US which is a big plus. You won't have to wait to wake up next day morning to read responses to emails which you wrote the previous night :-).
So what do you look out for?
For one - hire through someone whom you already know. That would allow us to lower risk. If you know someone in the same time zone / region - thats an added advantage.
Europeans are notorious for their short work weeks: Don't be surprised. Give them their space and set mutually acceptable deadlines.
Work life balance is important for most Europeans: They are not going to skip dinner with family to sit with a call with you, so pre-schedule calls and meetings and don't stretch unless really needed. They could be faster than you are or a bit slower. Understand that the culture is different. Learn to accommodate that.
Watch out for language issues: Spain is not necessarily all Spanish as the educated public does speak enough English. But it won't be the accent that you are used to, so follow up verbal communications with a written confirmation, just in case!.
So what are the payscales looking like?
Part time engineers are probably available for € 12,000-15,000 at the least. These are well experienced engineers whom you won't have to explain what is maven / twitter bootstrap and hand hold them through the whole process. Remember - the productivity of a experienced part-time engineer could be as high as a full time engineer (albeit with a lower experience).
Full timers are available between € 40,000 upwards. If you negotiate hard, you could actually get a great deal.
Where to start? I found our hire through twitter (incidentally it was his first tweet :-) ) ..though LinkedIn would be a great place to start.