I just need the result in theory. I’m not an expert in computer architecture, just a software developer.
Most of my friends think a 64-bit CPU is two times faster than a 32-bit CPU. However I think a 64-bit CPU is 2 ^ 2 = 4 times faster than a 32-bit CPU (in theory).
Which is the right answer to this question? And, if there were 128-bit CPUs, how many times faster would it be than a 32-bit CPU?
PS: I searched with Google, and found a link that referred to benchmarks that supported my answer.
I’ve seen a benchmark where 64-bit was 4x faster but that was with a
particular encryption where the extra registers available made the
32-bit, 64-bit, and 128-bit all refer to the word length of the processor, which can be thought of as the “fundamental data type”. Often, this is the number of bits transferred to/from the RAM of the system, and the width of pointers (although nothing stops you from using software to access more RAM then what a single pointer can access).
Assuming a constant clock speed (as well as everything else in the architecture being constant), and assuming memory reads/writes are the same speed (we assume 1 clock cycle here, but this is far from the case in real life), there is no direct speed advantage from a 32-bit to a 64-bit processor, except when higher-precision numbers are required. For example, if I need to add two 64-bit numbers, I can do it in a single clock cycle on a 64-bit machine (three if you count fetching the numbers from RAM):
ADDA [NUM1], [NUM2] STAA [RESULT]
However, on a 32-bit machine, I need to do this in many clock cycles, since I first need to add the lower 32-bits, and then compensate for overflow, then add the upper 64-bits:
ADDA [NUM1_LOWER], [NUM2_LOWER] STAA [RESULT_LOWER] CLRA ; I'm assuming the condition flags are not modified by this. BRNO CMPS ; Branch to CMPS if there was no overflow. ADDA #1 ; If there was overflow, compensate the value of A. CMPS ADDA [NUM1_UPPER], [NUM2_UPPER] STAA [RESULT_UPPER]
Going through my made-up assembly syntax, you can easily see how higher-precision operations can take an exponentially longer time on a lower word length machine. This is the real key to 64-bit and 128-bit processors: they allow us to handle larger numbers of bits in a single operation.
Now, to extend to this point, even if you add the ability to a 32-bit processor to add two 64-bit numbers in a single clock cycle, you still need more than one clock cycle to fetch those numbers from RAM, since the word length (again) is often the fundamental size of memory operations. So, let’s assume we have two 64-bit registers (A64 and B64), and have an operation called ADDAB64 which adds A and B, and stores it in A:
LDDA64 [NUM1] ; Takes 2 clock cycles, since this number is fetched 32-bits at a time LDAB64 [NUM2] ; Again, two more clock cycles. ADDAB64 ; This only takes 1. STAA64 [RESULT] ; However, this takes two again, since we need to store a 64-bit result 32-bits at a time.
As you can see, even a hardware implementation of a 64-bit addition under a 32-bit processor still takes 7 clock cycles at minimum (and this assumes all memory reads/writes take a single clock cycle). Where I’m going with this has performance implications specifically with pointers.
On a 32-bit machine, pointers can address ~4GB of RAM, where they can address over 16.7 million TB on a 64-bit machine. If you needed to address past 4GB on 32-bit, you would need to compensate for that kind of like how we added a 64-bit number on our 32-bit machine above. You would have many extra clock cycles dedicated to fetching and parsing those wider numbers, whereas those operations go much quicker on a processor that can handle it all in one word.
Also, while increasing the number of bits in an ALU will increase propagation delays for most operations, this delay is very manageable in today’s processors (or else we couldn’t keep the same clock speeds as our 32-bit processor variants), and is not much use when discussing digital synchronous circuits (since everything is clocked together, if the propagation delay was too long, the processor would just crash – which is also why there are limits to overclocking).
The bottom line: larger word lengths means we can process more data faster in the processor, which is greatly needed as we advance computing technology. This is why so many instruction set extensions (MMX, SSE, etc…) have been created: to process larger amounts of data in less amount of time.
A larger word length in a processor does not directly increase the performance of the system, but when dealing with larger (or higher precision) values is required, exponential performance gains can be realized. While the average consumer may not notice these increases, they are greatly appreciated in the fields of numeric computing, scientific analysis, video encoding, and encryption/compression.