Why are newer generations of processors faster at the same clock speed?

Why, for example, would a 2.66 GHz dual-core Core i5 be faster than a 2.66 GHz Core 2 Duo, which is also dual-core? Is this because of newer instructions that can process information in fewer clock cycles? What other architectural changes are involved?

This question comes up often and the answers are usually the same. This post is meant to provide a definitive, canonical answer for this question. Feel free to edit the answers to add additional details.

46.3k 43 43 gold badges 167 167 silver badges 211 211 bronze badges asked Jan 29, 2013 at 23:56 8,358 21 21 gold badges 73 73 silver badges 113 113 bronze badges Commented Jan 30, 2013 at 1:41 Wow both breakthroughs and david's are great answers. I dont know which to pick as correct :P Commented Jan 30, 2013 at 2:42

Also better instruction set and more registers. e.g. MMX (very old now), and x86_64 (When AMD invented x86_64 they added some compatibility breaking improvements, when in 64 bit mode. They realised that comparability would be broken anyway).

Commented Jul 21, 2015 at 20:34

For real big improvements of x86 architecture, a new instruction set is needed, but if that was done then it would not be an x86 any more. It would be a PowerPC, mips, Alpha, … or ARM.

Commented Jul 21, 2015 at 20:37

5 Answers 5

It's not because of newer instructions usually. It's just because the processor requires fewer instruction cycles to execute the same instructions. This can be for a large number of reasons:

  1. Large caches mean less time wasted waiting for memory.
  2. More execution units means less time waiting to start operating on an instruction.
  3. Better branch prediction means less time wasted speculatively executing instructions that never actually need to be executed.
  4. Execution unit improvements mean less time waiting for instructions to complete.
  5. Shorter pipelines means pipelines fill up faster.
answered Jan 29, 2013 at 23:58 David Schwartz David Schwartz 62.1k 7 7 gold badges 102 102 silver badges 150 150 bronze badges

I believe the Core architecture has a 14-15 stage pipeline (ref), and the Nehalem/Sandy Bridge has roughly a 14-17 stage pipeline (ref).

Commented Jan 30, 2013 at 0:10

Shorter piplines are easier to keep full and reduce the penalties of pipeline flushes. Longer pipelines generally permit higher clock speeds.

Commented Jan 30, 2013 at 0:11

That's what I mean, I think the pipeline depth itself has remained the same or has increased. Also in the Intel 64 and IA-32 SW Dev Manuals, the last mention of a pipeline change is in Vol. 1, Ch. 2.2.3/2.2.4 (the Intel Core/Atom microarchitectures).

Commented Jan 30, 2013 at 0:13

The effort to raise clock speeds has resulted in longer pipelines. That got ridiculous (as many as 31 stages!) towards the end of the NetBurst era. These days, it's a delicate engineering decision with advantages and disadvantages both ways.

Commented Jan 30, 2013 at 0:19

also branch prediction improvements, instruction reordering/optimization/mux unit improvements, miniaturization (reduced heat) and die-design (improved one-die paths/circuits, etc), .

Commented Jun 22, 2016 at 18:59

Designing a processor to deliver high performance is far more than just increasing the clock rate. There are numerous other ways to increase performance, enabled through Moore's law and instrumental to the design of modern processors.

Clock rates can't increase indefinitely.

Graph of stock clock speeds in cutting-edge enthusiast PCs over the years.


Graph of stock clock speeds in cutting-edge enthusiast PCs over the years. Image source

Seemingly sequential instruction streams can often be parallelized.

Pipelining breaks instructions into smaller pieces which can be executed in parallel.

Diagram of a five-stage instruction pipeline


Image source

However, pipelining can introduce hazards which must be resolved to ensure correct program execution.

Branch prediction is used to resolve control hazards which can disrupt the entire pipeline.

Caches are used to speed up memory accesses.

Out-of-order execution reduces stalls due to hazards by allowing independent instructions to execute first.

Superscalar architectures allow multiple instructions within an instruction stream to execute at the same time.

Diagram of Haswell execution engine


Image source

More advanced instructions are added which perform complex operations in less time.

So how do these techniques improve processor performance over time?