I was playing around with this on the WP-34S and could see that how one codes the loop makes a big difference.
If you run two tests, one with one addition inside the loop, and the second with two additions inside the same loop, then it is possible to deduce the cost of the loop and figure out just the cost of the addition operation.
I ran some experiments on the WP-34S to see how consistent this was. I ran a loop with 1,2,3,4,5, and 6 additions inside it and the cost of the loop was consistent.
When one takes this approach, it turns out that the setup code and the loop can be as complex as need be in order to set up the benchmark of the individual operation. Thus, for multiplication or division, one can make sure the stack is set up with valid numbers. The multiple tests will hide the cost of the setup/loop.
I ran six tests to benchmark addition, and multiplication on the WP-34S. You can see that the average result is that the calculator can do 2428 additions per second and 2422 multiplications per second.
Here is the spreadsheet with my numbers and calculations:
Spreadsheet with results (LibreOffice/OpenOffice)
I would find it interesting to use this kind of approach to compare the most common operations on all the calculators to get an idea of their relative speed. I have played with different modes on the WP-34S and can see the difference they make.
I will continue to benchmark the WP-34S instructions in different modes and will post the results in the next few days.