After someone did a naive analysis of x86 instruction diversity in typical binaries, I had my suspicions. I have looked at enough disassembled x86 code to get a good feeling of what the balance is. Sure enough another analysis was done and the numbers look much more inline with what I’ve seen. It basically boils down to ‘moves’ and ‘functional call’ instructions, and then ‘comparison and branch’ instructions. This mostly coincides with the current beliefs about actual instruction streams so it isn’t surprising to see a strong correlation.
What would be interesting to see is the breakdown of instruction streams with different languages. Specifically, if a basket of programs were analyzed at runtime, what does the pie chart look like then?
Another interesting take away from the above #’s is the old 80/20 or Paretto principle effect. Looking at the actual break down it is easy to see that 16 instructions can do nearly 75 – 80 % of the job. A processor with instruction sizes optimized for just those instructions might have much smaller executables.