Need for speed: loop optimalisation
I like these part from the Monty Python film Monty Python and the Holy Grail:
“A Reading from the Book of Armaments, Chapter 4, Verses 16 to 20:
Then did he raise on high the Holy Hand Grenade of Antioch, saying, "Bless this, O Lord, that with it thou mayst blow thine enemies to tiny bits, in thy mercy." And the people did rejoice and did feast upon the lambs and toads and tree-sloths and fruit-bats and orangutans and breakfast cereals ... Now did the Lord say, "First thou pullest the Holy Pin. Then thou must count to three. Three shall be the number of the counting and the number of the counting shall be three. Four shalt thou not count, neither shalt thou count two, excepting that thou then proceedeth to three. Five is right out. Once the number three, being the number of the counting, be reached, then lobbest thou the Holy Hand Grenade in the direction of thine foe, who, being naughty in my sight, shall snuff it."
King Arthur: Right. One... two... five.
Galahad: Three, sir.
King Arthur: Three.“
If you like to optimize your code, you will found articles about the differences between the
.
But I like to know which is the difference (if exist) between this loops:
count from zero to integer
count from integer to zero
For a first sight I don`t found too big differences but maybe....
I wrote one easy code in C with different types of the looping but the cores inside the loops was similar; do something because if the loop core is empty, then some optimizing function can drop the whole loop.
In these cases the core inside the loops run with time O (ordo) every time, the only important differences are the comparisons in the third parameters of the for command.
fortest.c:
#include <stdio.h>
#include <time.h>
#include <sys/time.h>
#include <time.h>
#define timersub(a, b, result) do { \
(result)->tv_sec = (a)->tv_sec - (b)->tv_sec; \
(result)->tv_usec = (a)->tv_usec - (b)->tv_usec; \
if ((result)->tv_usec < 0) { --(result)->tv_sec; (result)->tv_usec += 1000000; } \
} while (0);
void test1(unsigned long limit) { int x=0;unsigned long d;
for (d=0 ; d !=limit; ++d) { ++x; if (x<0) { return ; }}
}
void test2(unsigned long limit) {int x=0;unsigned long d;
for (d=0; d < limit; ++d) { ++x; if (x<0) { return ; }}
}
void test3(unsigned long limit) {int x=0;unsigned long d;
for (d= limit; d != 0; --d) { ++x; if (x<0) {return ;}}
}
void test4(unsigned long limit) {int x=0;unsigned long d;
for (d= limit; d > 0; --d) { ++x; if (x<0) { return ; }}
}
void test11(unsigned long limit) { int x=0;unsigned long d;
for (d=0 ; d !=limit; ++d) { ++x; if (x==0) { return ; }}
}
void test12(unsigned long limit) {int x=0;unsigned long d;
for (d=0; d < limit; ++d) { ++x; if (x==0) { return ; }}
}
void test13(unsigned long limit) {int x=0;unsigned long d;
for (d= limit; d != 0; --d) { ++x; if (x==0) {return ;}}
}
void test14(unsigned long limit) {int x=0;unsigned long d;
for (d= limit; d > 0; --d) { ++x; if (x==0) { return ; }}
}
void main () {
struct timeval t1,t2,t3,t4,t5,t6,t7,t21,t32,t43,t54,t65,t76;
int x;
unsigned long limit = 300000000;
for (int a=30; a!=0; --a) {
gettimeofday(&t1, NULL); test1(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test2(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test3(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test4(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test11(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test12(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test13(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
gettimeofday(&t1, NULL); test14(limit); gettimeofday(&t2, NULL); timersub(&t2, &t1, &t21); printf("%ld.%06ld\t", (long int)t21.tv_sec, (long int)t21.tv_usec);
printf("\n");
}
}
After the compilation ( gcc -std=c99 -o fortest ./fortest.c ) I ran the program (DELL notebook with Intel COREi5, Debian, 8G ram , nothing interesting) here is the result:
./fortest
0.819198 0.723209 0.547804 0.548121 0.715225 0.739564 0.548305 0.548662
0.749286 0.729770 0.548365 0.547611 0.716539 0.742632 0.547106 0.551919
0.749886 0.719612 0.548613 0.547090 0.717064 0.742903 0.547392 0.550720
0.750686 0.726905 0.549612 0.548379 0.719603 0.740837 0.547805 0.551163
0.750537 0.723098 0.549605 0.548025 0.718798 0.740684 0.548263 0.549131
0.752201 0.731910 0.547758 0.548353 0.728089 0.739966 0.548394 0.549994
0.747660 0.725875 0.547534 0.547997 0.721134 0.737523 0.557083 0.548719
0.746503 0.719368 0.547573 0.548859 0.724068 0.739599 0.548079 0.551764
0.745228 0.725389 0.547391 0.548952 0.717829 0.738182 0.547810 0.551259
0.749495 0.723209 0.548815 0.547587 0.716831 0.739543 0.547409 0.551856
.
.
I had some conception about the result but the was a big surprise for me:
The differences between the comparisons are very significant!
The speed of the comparisons in C (from fastest to slowers):
The difference is up to ~40 percent.....
The origin of the differences came from the cpu code: when the cpu want compare the variable with the limit then they need access the ram (I know: the cache) but the comparison with the zero not.
I compiled without any compiler-level speed optimization.
If I have some time then I rewrite and rerun the code in Java, Javascript, Ada, Pascal, basic, BASH, Perl, php, python, (Excell macro???) and I will update this.
This is more interesting what I thought :D