Intel Lunar Lake: New P-Core, Enter Lion Cove

Diving straight into the Performance, or P-Core commonly referred to, has had major architectural updates to increase power efficiency and performance. Bigger of these updates, Intel needed to comprehensively update its classic P-core cache hierarchy.

Key among these improvements is a significant overhaul of Intel's traditional P-core cache hierarchy. The fresh design for Lion Cove uses a multi-tier data cache containing a 48KB L0D cache with 4-cycle load-to-use latency, a 192KB L1D cache with 9-cycle latency, and an extended L2 cache that gets up to 3MB with 17-cycle latency. In total, this puts 240KB of cache within 9 cycles' latency of the CPU cores, whereas Redwood Cove before it could only reach 48KB of cache in the same period of time.

The data translation lookaside buffer (DTLB) has also been revised, increasing its depth from 96 to 128 pages to improve its hit rate.

Intel has also added a third Address Generation Unit (AGU)/Store Unit pair to further boost the performance of data write operations. Intel has also thrown more cache at the problem, and as CPU complexity grows, so does the reliance on the cache subsystems to keep them fed. Intel has also reworked the core-level cache subsystem by adding an intermediate data cache (IDC) between the 48 KB L1 and the L2 level. The original L1D cache is now called the L0 D-cache internally and retires to a 192 KB L1 D-cache.

The latest Lion Cove P-core design also includes a new front-end for handling instructions. The prediction block is 8x larger, fetch is wider, decode bandwidth is higher than on Raptor Cove, and there has been an enormous increase in Uops cache capacity and read bandwidth. The change in Uop queue capacity is designed to enhance the overall performance throughput.

The out-of-order engine in Lion Cove is partitioned in the footprint for Integer (INT) and Vector (VEC) domains Execution Domain with Independent renaming and scheduling. This type of partitioning allows for expandability in the future, independent growth of each domain, and benefits toward reduced power consumption for a domain-specific workload. The out-of-order engine is also improved, going from 6 to 8-wide allocation/rename and 8 to 12-wide retirement, with the deep instruction window increased from 512 to 576 entries and from 12 to 18 execution ports.

Lion Cove's integer execution units have also been improved over Raptor Cove, with execution resources grown from 5 to 6 integer ALUs, 2 to 3 jump units, and 2 to 3 shift units. Scaling from 1 to 3 units, these multiply 64x64 units to 64, which takes 3 units and gives even more compute power for the harder part of computation. Another significant development is transforming the P-core database from a 'sea of fubs' to a 'sea of cells.' This process of migrating the sub-organization of the P-cores structure from fubs to more organized cells essentially increases the density.

Intel has removed Hyper-Threading (HT) from their Lunar Lake SoC, with one potential reason being to enhance power efficiency and single-thread performance. By eliminating HT, Intel reduces power consumption and simplifies thermal management, which should extend battery life in ultra-thin notebooks. Intel does make a couple of claims regarding the Lion Cove P-cores, which are set to offer approximately 15% better performance-to-power and performance-to-area ratios than cores with HT. Intel's hybrid architecture, which effectively utilizes E-cores for multi-threaded tasks, reduces the need for HT, allowing workloads to be distributed more efficiently by the Intel Thread Director.

Power management has also been refined by including AI self-tuning controllers to replace the static thermal guard bands. This lets the system respond dynamically to real-time operating conditions in an adaptive way to achieve higher sustained performance. Intel also implements Lion Cove P-Core clock speeds at tighter 16.67MHz intervals rather than the traditional 100MHz. This means more accurate power management and finer tuning to squeeze as much from the power budget as possible.

Intel's Lion Cove P-Core microarchitecture looks like a nice upgrade over Golden Cove. Lion Cove incorporates improved memory and cache subsystems and better power management while not relying solely on opting for faster P-core frequencies to boost the IPC performance.

Intel Unveils Lunar Lake Architecture: Overview Intel Lunar Lake: New E-Core, Skymont Takes Flight For Peak Efficiency
POST A COMMENT

91 Comments

View All Comments

  • mode_13h - Thursday, June 6, 2024 - link

    The way I see it, the only defense Intel has for comparing Skymont to the LP Crestmont cores is to defend their decision not to include a separate LP version of Skymont, in Lunar Lake.

    In fact, I'll bet what happened is that someone internally made this pitch and the marketing goon who produced the public-facing slides for Lunar Lake opted to reuse that data, since it made Skymont look even better (it's already quite impressive)!
    Reply
  • kwohlt - Tuesday, June 4, 2024 - link

    The MTL E cores shared a ringbus with the P cores. The LNL E cores are completely separated from the P cores and function much more similarly to current LP-E cores Reply
  • name99 - Wednesday, June 5, 2024 - link

    So how much faster are they than the MTL E-cores (as opposed to the LP-E cores)?

    Sure it's nice that the dumbness of MTL is fixed, but the question is the one I'm interested in.
    Reply
  • mode_13h - Tuesday, June 4, 2024 - link

    So is L0D just a new name for what they previously called L1D? The two seem virtually identical, at least in terms of the information they disclosed. Reply
  • mode_13h - Tuesday, June 4, 2024 - link

    The thing they're *now* calling L1D is what seems to be the new part. Reply
  • Dante Verizon - Tuesday, June 4, 2024 - link

    AMD will be swimming ahead, how pathetic. Reply
  • lmcd - Monday, June 17, 2024 - link

    AMD doesn't build a package that competes with this product. If Intel delivers with Xe2 (and there's no reason to believe they will, to be clear), this product would win the entire handheld gaming category for the generation in about 30 seconds flat. Lunar Lake wouldn't actually be impossible to stuff into a phablet-style phone, though it obviously wouldn't be easy. Reply
  • kkilobyte - Tuesday, June 4, 2024 - link

    what about the i9-14900KS test redo with Intel Default settings? You told us 20 days ago that you'd redo them :

    Gavin Bonshor - Friday, May 10, 2024 - link
    Don't worry; I will be testing Intel Default settings, too. I'm testing over the weekend and adding them in.

    So, will this promise be ever fullfilled?
    Reply
  • kn00tcn - Tuesday, June 4, 2024 - link

    are you confirming that he chart images have not changed or were you waiting for an announcement? Reply
  • kkilobyte - Tuesday, June 4, 2024 - link

    Unless I'm mistaken, the charts don't seem to have changed, and include only a single set of data (without the Intel Default Settings). The text doesn't suggest they were, though I didn't read the whole article again, I admit. Reply

Log in

Don't have an account? Sign up now