Vue lecture

Il y a de nouveaux articles disponibles, cliquez pour rafraîchir la page.

AMD Slims Down Compute With Radeon Pro W7900 Dual Slot For AI Inference

While the bulk of AMD’s Computex presentation was on CPUs and their Instinct lineup of dedicated AI accelerators, the company also has a small product refresh for the professional graphics and workstation AI crowd. AMD is releasing a dual-slot version of their high-end Radeon Pro W7900 card – aptly named the W7900 Dual Slot – with the intent being to improve compute density in workstations by making it possible to install 4 of the cards inside a single chassis.

The release of a dual-slot version of the card comes after the original Radeon Pro W7900 was the first time AMD went with a larger, triple-slot form factor for their flagship workstation card. With the W7000 generation bringing an all-around increase in power consumption, pushing the W7900 to 295 Watts, AMD originally opted to release a larger card for improved acoustics. However this came at the cost of compute density, as most systems could only fit 2 of the thicker cards. As a result, AMD is opting to release a dual-slot version of the hardware as well, to offer a more competitive product for high-density workstation systems – particularly those doing local AI inference.

AMD Radeon Pro Specification Comparison
  AMD Radeon Pro W7900DS AMD Radeon Pro W7900 AMD Radeon Pro W7800 AMD Radeon Pro W6800
ALUs 12288
(96 CUs)
8960
(70 CUs)
3840
(60 CUs)
ROPs 192 128 96
Boost Clock 2.495GHz 2.495GHz 2.32HHz
Peak Throughput (FP32) 61.3 TFLOPS 45.2 TFLOPS 17.8 TFLOPS
Memory Clock 18 Gbps GDDR6 18 Gbps GDDR6 16 Gbps GDDR6
Memory Bus Width 384-bit 256-bit 256-bit
Memiry Bandwidth 864GB/sec 576GB/sec 512GB/sec
VRAM 48GB 32GB 32GB
ECC Yes
(DRAM)
Yes
(DRAM)
Yes
(DRAM)
Infinity Cache 96MB 64MB 128MB
Total Board Power 295W 260W 250W
Manufacturing Process GCD: TSMC 5nm
MCD: TSMC 6nm
GCD: TSMC 5nm
MCD: TSMC 6nm
TSMC 7nm
Architecture RDNA3 RDNA3 RDNA2
GPU Navi 31 Navi 31 Navi 21
Form Factor Dual Slot Blower Triple Slot Blower Dual Slot Blower Dual Slot Blower
Launch Date 06/2024 Q2'2023 Q2'2023 06/2021
Launch Price (MSRP) $3499 $3999 $2499 $2249

Other than the narrower cooler, the Radeon Pro W7900DS is for all intents and purposes identical to the original W7900, with the same Navi 31 GPU being driven to the same clockspeeds, and the overall board being run to the same 295 Total Board Power (TBP) limit. This is paired with the same 18Gbps GDDR6 as before, giving the card 48GB of VRAM.

Officially, AMD doesn’t have a noise specification for these cards. But you can expect that the W7900DS will be louder than its triple-slot senior. By all appearances, AMD is just using the cooler from the W7800, which was a dual-slot card from the start, so that cooler is being tasked with handling another 35W of heat dissipation.

As the W7800 was also AMD’s fastest dual-slot card up until now, it’s an apt point of comparison for compute density. With its full-fat Navi 31 GPU, the W7900DS will offer about 36% more compute/pixel throughput than its sibling/predecessor. So it’s a not-insubstantial improvement for the very specific niche AMD has in mind for the card.

And like so many other things being announced at Computex this year, that niche is AI. While AMD offers PCIe versions of their Instinct MI210 accelerators, those cards are geared at servers, with fully-passive coolers to match. So workstation-level compute is largely picked up by AMD’s Radeon Pro workstation cards, which are intended to go into a traditional PC chassis and use active cooling (blowers). In this case, AMD is specifically going after local inference workloads, as that’s what the Radeon hardware and its significant VRAM pool are best suited for.

The Radeon Pro W7900 Dual Slot will drop on June 19th. Notably, AMD is introducing the card at a slightly lower price tag than they launched the original W7900 at last year, with the W7900DS hitting retail shelves at $3499, down from the W7900’s original $3999 price tag.

ROCm 6.1 For Radeons Coming as Well

Alongside the release of the W7900DS, AMD is also promoting the upcoming Radeon release of ROCm 6.1, their software stack for GPU computing. While baseline ROCm 6.1 was introduced back in April, the Windows version of AMD’s software stack is still a trailing (and feature limited) release. So that is slated to finally get bumped up to a ROCm 6.1 release on June 19th, the same day the W7900DS launches.

ROCm 6.1 for Radeons is slated to bring a couple of major changes/improvements to the stack, particularly when it comes to expanding the scope of available features. Notably, AMD will finally be shipping Windows Subsystem for Linux 2 (WSL2) support, albeit at a beta level, allowing Windows users to access the much richer feature set and software ecosystem of ROCm under Linux. This release will also incorporate improved support for multi-GPU configurations, perfect timing for the launch of the Radeon Pro W7900DS.

Finally, ROCm 6.1 sees TensorFlow integrated into the ROCm software stack as a first-class citizen. While this matter involves more complexities than can be summarized in a simple news story, native TensorFlow support under Windows was previously blocked by a lack of a Windows version of AMD’s MIOpen machine learning library. Combined with WSL2 support, developers will have two ways to access TensorFlow on Windows systems going forward.

AMD Plans Massive Memory Instinct MI325X for Q4'24, Lays Out Accelerator Roadmap to 2026

In a packed presentation kicking off this year’s Computex trade show, AMD CEO Dr. Lisa Su spent plenty of time focusing on the subject of AI. And while the bulk of that focus was on AMD’s impending client products, the company is also currently enjoying the rapid growth of their Instinct lineup of accelerators, with the MI300 continuing to break sales projections and growth records quarter after quarter. It’s no surprise then that AMD is looking to move quickly then in the AI accelerator space, both to capitalize on the market opportunities amidst the current AI mania, as well as to stay competitive with the many chipmakers large and small who are also trying to stake a claim in the space.

To that end, as part of this evening’s announcements, AMD laid out their roadmap for their Instinct product lineup for both the short and long term, with new products and new architectures in development to carry AMD through 2026 and beyond.

On the product side of matters, AMD is announcing a new Instinct accelerator, the HBM3E-equipped MI325X. Based on the same computational silicon as the company’s MI300X accelerator, the MI325X swaps out HBM3 memory for faster and denser HBM3E, allowing AMD to produce accelerators with up to 288GB of memory, and local memory bandwidths hitting 6TB/second.

Meanwhile, AMD also showcased their first new CDNA architecture/Instinct product roadmap in two years, laying out their plans through 2026. Over the next two years AMD will be moving very quickly indeed, launching two new CDNA architectures and associated Instinct products in 2025 and 2026, respectively. The CDNA 4-powered MI350 series will be released in 2025, and that will be followed up by the even more ambitious MI400 series in 2026, which will be based on the CDNA "Next" architecture.

NVIDIA Intros RTX A1000 and A400: Entry-Level ProViz Cards Get Ray Tracing

With NVIDIA’s Turing architecture turning six years old this year, the company has been retiring many of the remaining Turing products from its video card lineup. And today that spirit of spring cleaning is coming to the entry-level segment of NVIDIA’s professional visualization lineup, where NVIDIA is introducing a pair of new desktop cards based on their low-end Ampere hardware.

The new RTX A1000 and RTX A400 cards will be replacing the T1000/T600/T400 lineup, which was released three years ago in 2021. The new cards slot into the same entry-level category and finally finish fleshing out the RTX A series of proviz cards, offering NVIDIA’s Ampere-generation professional graphics technologies in the lowest-power, lowest-performance, lowest-cost configuration possible.

Notably, since the entry-level T-series were based on NVIDIA’s feature-limited TU11x silicon, which lacked ray tracing and tensor core support – the basis of NVIDIA’s RTX technologies and associated branding – this marks the first time these technologies will be available in NVIDIA’s entry-level desktop proviz cards. And accordingly, these are being promoted to RTX-branded video cards, ending the odd overlap with NVIDIA’s compute cards, which never carry RTX branding.

It goes without saying that as low-end cards, the ray tracing performance of either part is nothing to write home about, but it gives NVIDIA’s current proviz lineup a consistent set of graphics features from top to bottom.

NVIDIA Professional Visualization Card Specification Comparison
  A1000 A400 T1000 T400
CUDA Cores 2304 768 896 384
Tensor Cores 72 24 N/A N/A
Boost Clock 1460MHz 1755MHz 1395MHz 1425MHz
Memory Clock 12Gbps GDDR6 12Gbps GDDR6 10Gbps GDDR6 10Gbps
GDDR6
Memory Bus Width 128-bit 64-bit 128-bit 64-bit
VRAM 8GB 4GB 8GB 4GB
Single Precision 6.74 TFLOPS 2.7 TFLOPS 2.5 TFLOPS 1.09 TFLOPS
Tensor Performance 53.8 TFLOPS 21.7 TFLOPS N/A N/A
TDP 50W 50W 50W 30W
Cooling Active, SS Active, SS Active, SS Active, SS
Outputs 4x mDP 1.4a 4x mDP 1.4a 3x mDP 1.4a
GPU GA107 TU117
Architecture Ampere Turing
Manufacturing Process Samsung 8nm TSMC 12nm
Launch Date 04/2024 05/2024 05/2021 05/2021

Both the A1000 and A400 are based on the same board design, with NVIDIA doing away with any pretense of physical feature differentiation this time around (T400 was missing its 4th Mini DisplayPort). This means both cards are based on the GA107 GPU, sporting different core and memory configurations.

RTX A1000 is a not-quite-complete configuration of GA107, with 2304 CUDA cores and 72 tensor cores. This is paired with 8GB of GDDR6, which runs at 12Gbps, for a total of 192GB/second of memory bandwidth. The TDP of the card is 50 Watts, matching its predecessor.

Meanwhile RTX A400 is far more cut down, offering about a third of the active hardware on the GPU itself, and half the memory bandwidth. On paper this gives it around 40% of T1000’s performance, and half the memory bandwidth – or 96GB/second. Notably, despite the hardware cut-down, the official TDP is still 50 Watts, versus the 30 Watts of its predecessor. So at this point NVIDIA will soon cease offering a desktop proviz card lower than 50 Watts.

As noted before, both cards otherwise feature the same physical design, with a half-height half-length (HHHL) board with active cooling. As you’d expect from such low-TDP cards, these are single-slot cooler designs. Both cards feature a quartet of Mini DisplayPorts, with the same DP 1.4a functionality that we’ve seen across all of NVIDIA’s products for the last several years.

Finally, video-focused users will want to make note that the A1000/A400 have slightly different video capabilities. While A1000 gets access to both of GA107’s NVDEC video decode blocks, A400 only gets access to a single block – one more cutback to differentiate the two cards. Otherwise, both video cards get access to the GPU’s sole NVENC block.

According to NVIDIA, the RTX A1000 will be available starting today through its distribution partners. Meanwhile the RTX A400 will hit distribution channels in May, and with OEMs expected to begin offering the cards as part of their pre-built systems this summer.

Intel Introduces Gaudi 3 AI Accelerator: Going Bigger and Aiming Higher In AI Market

Intel this morning is kicking off the second day of their Vision 2024 conference, the company’s annual closed-door business and customer-focused get-together. While Vision is not typically a hotbed for new silicon announcements from Intel – that’s more of an Innovation thing in the fall – attendees of this year’s show are not coming away empty handed. With a heavy focus on AI going on across the industry, Intel is using this year’s event to formally introduce the Gaudi 3 accelerator, the next-generation of Gaudi high-performance AI accelerators from Intel’s Habana Labs subsidiary.

The latest iteration of Gaudi will be launching in the third quarter of 2024, and Intel is already shipping samples to customers now. The hardware itself is something of a mixed bag in some respects (more on that in a second), but with 1835 TFLOPS of FP8 compute throughput, Intel believes it’s going to be more than enough to carve off a piece of the expansive (and expensive) AI market for themselves. Based on their internal benchmarks, the company expects to be able beat NVIDIA’s flagship Hx00 Hopper architecture accelerators in at least some critical large language models, which will open the door to Intel grabbing a larger piece of the AI accelerator market at a critical time in the industry, and a moment when there simply isn’t enough NVIDIA hardware to go around.

Introspect Intros GDDR7 Test System For Fast GDDR7 GPU Design Bring Up

Introspect this week introduced its M5512 GDDR7 memory test system, which is designed for testing GDDR7 memory controllers, physical interface, and GDDR7 SGRAM chips. The tool will enable memory and processor manufacturers to verify that their products perform as specified by the standard.

One of the crucial phases of a processor design bring up is testing its standard interfaces, such as PCIe, DisplayPort, or GDDR is to ensure that they behave as specified both logically and electrically and achieve designated performance. Introspect's M5512 GDDR7 memory test system is designed to do just that: test new GDDR7 memory devices, troubleshoot protocol issues, assess signal integrity, and conduct comprehensive memory read/write stress tests.

The product will be quite useful for designers of GPUs/SoCs, graphics cards, PCs, network equipment and memory chips, which will speed up development of actual products that rely on GDDR7 memory. For now, GPU and SoC designers as well as memory makers use highly-custom setups consisting of many tools to characterize signal integrity as well as conduct detailed memory read/write functional stress testing, which are important things at this phase of development. But usage of a single tool greatly speeds up all the processes and gives a more comprehensive picture to specialists.

The M5512 GDDR7 Memory Test System is a desktop testing and measurement device that is equippped with 72 pins capable of functioning at up to 40 Gbps in PAM3 mode, as well as offering a virtual GDDR7 memory controller. The device features bidirectional circuitry for executing read and write operations, and every pin is equipped with an extensive range of analog characterization features, such as skew injection with femto-second resolution, voltage control with millivolt resolution, programmable jitter injection, and various eye margining features critical for AC characterization and conformance testing. Furthermore, the system integrates device power supplies with precise power sequencing and ramping controls, providing a comprehensive solution for both AC characterization and memory functional stress testing on any GDDR7 device.

Introspects M5512 has been designed in close collaboration with JEDEC members working on the GDDR7 specification, so it promises to meet all of their requirements for compliance testing. Notably, however, the device does not eliminate need for interoperability tests and still requires companies to develop their own test algorithms, but it's still a significant tool for bootstrapping device development and getting it to the point where chips can begin interop testing.

“In its quest to support the industry on GDDR7 deployment, Introspect Technology has worked tirelessly in the last few years with JEDEC members to develop the M5512 GDDR7 Memory Test System,” said Dr. Mohamed Hafed, CEO at Introspect Technology.

AMD Announces FSR 3.1: Seriously Improved Upscaling Quality

AMD's FidelityFX Super Resolution 3 technology package introduced a plethora of enhancements to the FSR technology on Radeon RX 6000 and 7000-series graphics cards last September. But perfection has no limits, so this week, the company is rolling out its FSR 3.1 technology, which improves upscaling quality, decouples frame generation from AMD's upscaling, and makes it easier for developers to work with FSR.

Arguably, AMD's FSR 3.1's primary enhancement is its improved temporal upscaling image quality: compared to FSR 2.2, the image flickers less at rest and no longer ghosts when in movement. This is a significant improvement, as flickering and ghosting artifacts are particularly annoying. Meanwhile, FSR 3.1 has to be implemented by the game developer itself, and the first title to support this new technology sometime later this year is Ratchet & Clank: Rift Apart.

Temporal Stability

AMD FSR 2.2 AMD FSR 3.1
Ghosting Reduction

AMD FSR 2.2 AMD FSR 3.1

Another significant development brought by FSR 3.1 is its decoupling from the Frame Generation feature introduced by FSR 3. This capability relies on a form of AMD's Fluid Motion Frames (AFMF) optical flow interpolation. It uses temporal game data like motion vectors to add an additional frame between existing ones. This ability can lead to a performance boost of up to two times in compatible games, but it was initially tied to FSR 3 upscaling, which is a limitation. Starting from FSR 3.1, it will work with other upscaling methods, though AMD refrains from saying which methods and on which hardware for now. Also, the company does not disclose when it is expected to be implemented by game developers.

In addition, AMD is bringing support for FSR3 to Vulkan and Xbox Game Development Kit, enabling game developers on these platforms to use it. It also adds FSR 3.1 to the FidelityFX API, which simplifies debugging and enables forward compatibility with updated versions of FSR. 

Upon its release in September 2023, AMD FSR 3 was initially supported by two titles, Forspoken and Immortals of Aveum, with ten more games poised to join them back then. Fast forward to six months later, the lineup has expanded to an impressive roster of 40 games either currently supporting or set to incorporate FSR 3 shortly. As of March 2024, FSR is supported by games like Avatar: Frontiers of Pandora, Starfield, The Last of Us Part I. Shortly, Cyberpunk 2077, Dying Light 2 Stay Human, Frostpunk 2, and Ratchet & Clank: Rift Apart will support FSR shortly.

Source: AMD

NVIDIA Blackwell Architecture and B200/B100 Accelerators Announced: Going Bigger With Smaller Data

Already solidly in the driver’s seat of the generative AI accelerator market at this time, NVIDIA has long made it clear that the company isn’t about to slow down and check out the view. Instead, NVIDIA intends to continue iterating along its multi-generational product roadmap for GPUs and accelerators, to leverage its early advantage and stay ahead of its ever-growing coterie of competitors in the accelerator market. So while NVIDIA’s ridiculously popular H100/H200/GH200 series of accelerators are already the hottest ticket in Silicon Valley, it’s already time to talk about the next generation accelerator architecture to feed NVIDIA’s AI ambitions: Blackwell.

❌