Intel Lunar Lake: New P-Core, Enter Lion Cove

Diving straight into the Performance, or P-Core commonly referred to, has had major architectural updates to increase power efficiency and performance. Bigger of these updates, Intel needed to comprehensively update its classic P-core cache hierarchy.

Key among these improvements is a significant overhaul of Intel's traditional P-core cache hierarchy. The fresh design for Lion Cove uses a multi-tier data cache containing a 48KB L0D cache with 4-cycle load-to-use latency, a 192KB L1D cache with 9-cycle latency, and an extended L2 cache that gets up to 3MB with 17-cycle latency. In total, this puts 240KB of cache within 9 cycles' latency of the CPU cores, whereas Redwood Cove before it could only reach 48KB of cache in the same period of time.

The data translation lookaside buffer (DTLB) has also been revised, increasing its depth from 96 to 128 pages to improve its hit rate.

Intel has also added a third Address Generation Unit (AGU)/Store Unit pair to further boost the performance of data write operations. Intel has also thrown more cache at the problem, and as CPU complexity grows, so does the reliance on the cache subsystems to keep them fed. Intel has also reworked the core-level cache subsystem by adding an intermediate data cache (IDC) between the 48 KB L1 and the L2 level. The original L1D cache is now called the L0 D-cache internally and retires to a 192 KB L1 D-cache.

The latest Lion Cove P-core design also includes a new front-end for handling instructions. The prediction block is 8x larger, fetch is wider, decode bandwidth is higher than on Raptor Cove, and there has been an enormous increase in Uops cache capacity and read bandwidth. The change in Uop queue capacity is designed to enhance the overall performance throughput.

The out-of-order engine in Lion Cove is partitioned in the footprint for Integer (INT) and Vector (VEC) domains Execution Domain with Independent renaming and scheduling. This type of partitioning allows for expandability in the future, independent growth of each domain, and benefits toward reduced power consumption for a domain-specific workload. The out-of-order engine is also improved, going from 6 to 8-wide allocation/rename and 8 to 12-wide retirement, with the deep instruction window increased from 512 to 576 entries and from 12 to 18 execution ports.

Lion Cove's integer execution units have also been improved over Raptor Cove, with execution resources grown from 5 to 6 integer ALUs, 2 to 3 jump units, and 2 to 3 shift units. Scaling from 1 to 3 units, these multiply 64x64 units to 64, which takes 3 units and gives even more compute power for the harder part of computation. Another significant development is transforming the P-core database from a 'sea of fubs' to a 'sea of cells.' This process of migrating the sub-organization of the P-cores structure from fubs to more organized cells essentially increases the density.

Intel has removed Hyper-Threading (HT) from their Lunar Lake SoC, with one potential reason being to enhance power efficiency and single-thread performance. By eliminating HT, Intel reduces power consumption and simplifies thermal management, which should extend battery life in ultra-thin notebooks. Intel does make a couple of claims regarding the Lion Cove P-cores, which are set to offer approximately 15% better performance-to-power and performance-to-area ratios than cores with HT. Intel's hybrid architecture, which effectively utilizes E-cores for multi-threaded tasks, reduces the need for HT, allowing workloads to be distributed more efficiently by the Intel Thread Director.

Power management has also been refined by including AI self-tuning controllers to replace the static thermal guard bands. This lets the system respond dynamically to real-time operating conditions in an adaptive way to achieve higher sustained performance. Intel also implements Lion Cove P-Core clock speeds at tighter 16.67MHz intervals rather than the traditional 100MHz. This means more accurate power management and finer tuning to squeeze as much from the power budget as possible.

Intel's Lion Cove P-Core microarchitecture looks like a nice upgrade over Golden Cove. Lion Cove incorporates improved memory and cache subsystems and better power management while not relying solely on opting for faster P-core frequencies to boost the IPC performance.

Intel Unveils Lunar Lake Architecture: Overview Intel Lunar Lake: New E-Core, Skymont Takes Flight For Peak Efficiency
POST A COMMENT

91 Comments

View All Comments

  • mode_13h - Tuesday, June 4, 2024 - link

    Yeah, it definitely comes across as two-faced for Intel to be pitching its foundry business to others, while it's not even using it for its own cutting-edge CPUs! Reply
  • kn00tcn - Tuesday, June 4, 2024 - link

    1) bob (or brian?) made a deal with tsmc and they need fill the required capacity

    2) tsmc chiplet packaging requires all tiles to come from tsmc (but mixing tile foundries is fine as long as someone else packages)

    3) lunar lake isnt high power high core desktop/server, there's plenty else to make themselves, and obviously they've been ramping cutting edge future nodes

    4) these things take years, why would a recent subsidy relate to old deals
    Reply
  • mode_13h - Thursday, June 6, 2024 - link

    > 1) bob (or brian?) made a deal with tsmc and they need fill the required capacity

    This is probably the dumbest claim I've seen in a while. There's guaranteed to be an escape clause in that contract, although Intel would be stuck with some fee.

    Given the current demand for cutting-edge nodes, I'm sure Intel could probably work out an agreement with another fab customer to buy their excess wafer capacity and probably even turn a profit by it.

    > 2) tsmc chiplet packaging requires all tiles to come from tsmc

    Second dumbest claim in the thread. Lunar Lake uses Foveros, not TSMC's technology, and Intel is making the base layer on their own 22 nm node.

    > 3) lunar lake isnt high power high core desktop/server

    What does that have to do with anything? It still needs to compete on performance and efficiency!

    > why would a recent subsidy relate to old deals

    Who said anything about that?
    Reply
  • kwohlt - Tuesday, June 4, 2024 - link

    Intel's foundry service doesn't have a full suite of nodes to choose from and is currently building out a fabs. In the meantime, client will be using some of the TSMC N3B allocation that Intel carved out years ago. Expect 2024-2025 to be peak TSMC usage.

    What other options were realistically available? Intel 3 is just hitting the market and fully allocated to Xeon 6 initially. Intel 4 isn't library complete and wouldn't work for a tile that also contains NPU and GPU. Intel 7 is heavy DTCO'd for ADL/RPL and has poor low wattage performance. 18A isn't ready yet.

    By the time 14A releases, Intel will have a selection of 18A and the Intel 3 family of nodes to pick from for their other CPU tiles.
    Reply
  • mode_13h - Thursday, June 6, 2024 - link

    > Intel 3 is just hitting the market and fully allocated to Xeon 6 initially

    The Lunar Lake CPU tiles can't be very big. They should've been a good "pipe cleaner" product for Intel to ramp up their "3" node, before making the huge Xeon dies.

    I hadn't noticed the GPU was on the same tile. If true, I think they could've kept it on its own tile, as Meteor Lake did.
    Reply
  • lmcd - Wednesday, June 12, 2024 - link

    Intel has not shipped an Xe product on an Intel process since DG1. We don't know that it ports.

    Adding a separate die might have increased the package size, and part of the point of this product was to be a small package that could supplant Qualcomm designs easily (and the PMIC callout was specifically targeted at vendors that got burned by Qualcomm's power shenanigans, if you believe Charlie).
    Reply
  • andrewaggb - Thursday, June 6, 2024 - link

    yeah, it's not a great look on the fab side, but honestly I hope it's an amazing chip and worth upgrading. I hope Qualcomm's chip is great as well and get some actual innovation/competition going on. Reply
  • eonsim - Tuesday, June 4, 2024 - link

    Is Intel comparing there new E-cores to the LP-E cores here (the ones on the SoC with no L3), rather than the main E-cores for Meteor lake? Reply
  • mode_13h - Tuesday, June 4, 2024 - link

    +1 Reply
  • name99 - Wednesday, June 5, 2024 - link

    Exactly.
    And judging from what I've seen on the internet, plenty of people were fooled. And don't like to be told that they were fooled...
    Reply

Log in

Don't have an account? Sign up now