Thursday, July 31, 2025
No Result
View All Result
Coin Digest Daily
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoin
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • Metaverse
  • Web3
  • DeFi
  • Analysis
  • Scam Alert
  • Regulations
Marketcap
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoin
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • Metaverse
  • Web3
  • DeFi
  • Analysis
  • Scam Alert
  • Regulations
No Result
View All Result
Coin Digest Daily
No Result
View All Result

NVIDIA Enhances Llama 3.1 405B Performance with TensorRT Model Optimizer

29 August 2024
in Blockchain
Reading Time: 3 mins read
0 0
A A
0
Home Blockchain
Share on FacebookShare on Twitter




Lawrence Jengar
Aug 29, 2024 16:10

NVIDIA’s TensorRT Mannequin Optimizer considerably boosts efficiency of Meta’s Llama 3.1 405B giant language mannequin on H200 GPUs.





Meta’s Llama 3.1 405B giant language mannequin (LLM) is reaching new ranges of efficiency because of NVIDIA’s TensorRT Mannequin Optimizer, in response to the NVIDIA Technical Weblog. The enhancements have resulted in as much as a 1.44x improve in throughput when operating on NVIDIA H200 GPUs.

Excellent Llama 3.1 405B Inference Throughput with TensorRT-LLM

TensorRT-LLM has already delivered exceptional inference throughput for Llama 3.1 405B for the reason that mannequin’s launch. This was achieved by way of numerous optimizations, together with in-flight batching, KV caching, and optimized consideration kernels. These methods have accelerated inference efficiency whereas sustaining decrease precision compute.

TensorRT-LLM added assist for the official Llama FP8 quantization recipe, which calculates static and dynamic scaling components to protect most accuracy. Moreover, user-defined kernels corresponding to matrix multiplications from FBGEMM are optimized through plug-ins inserted into the community graph at compile time.

Boosting Efficiency As much as 1.44x with TensorRT Mannequin Optimizer

NVIDIA’s customized FP8 post-training quantization (PTQ) recipe, obtainable by way of the TensorRT Mannequin Optimizer library, enhances Llama 3.1 405B throughput and reduces latency with out sacrificing accuracy. This recipe incorporates FP8 KV cache quantization and self-attention static quantization, decreasing inference compute overhead.

Desk 1 demonstrates the utmost throughput efficiency, displaying vital enhancements throughout numerous enter and output sequence lengths on an 8-GPU HGX H200 system. The system options eight NVIDIA H200 Tensor Core GPUs with 141 GB of HBM3e reminiscence every and 4 NVLink Switches, offering 900 GB/s of GPU-to-GPU bandwidth.




Most Throughput Efficiency – Output Tokens/Second8 NVIDIA H200 Tensor Core GPUs


Enter | Output Sequence Lengths
2,048 | 128
32,768 | 2,048
120,000 | 2,048


TensorRT Mannequin Optimizer FP8
463.1
320.1
71.5


Official Llama FP8 Recipe
399.9
230.8
49.6


Speedup
1.16x
1.39x
1.44x

Desk 1. Most throughput efficiency of Llama 3.1 405B with NVIDIA inner measurements

Equally, Desk 2 presents the minimal latency efficiency utilizing the identical enter and output sequence lengths.




Batch Measurement = 1 Efficiency – Output Tokens/Second8 NVIDIA H200 Tensor Core GPUs


Enter | Output Sequence Lengths
2,048 | 128
32,768 | 2,048
120,000 | 2,048


TensorRT Mannequin Optimizer FP8
49.6
44.2
27.2


Official Llama FP8 Recipe
37.4
33.1
22.8


Speedup
1.33x
1.33x
1.19x

Desk 2. Minimal latency efficiency of Llama 3.1 405B with NVIDIA inner measurements

These outcomes point out that H200 GPUs with TensorRT-LLM and TensorRT Mannequin Optimizer are delivering superior efficiency in each latency-optimized and throughput-optimized eventualities. The TensorRT Mannequin Optimizer FP8 recipe additionally achieved comparable accuracy with the official Llama 3.1 FP8 recipe on the Massively Multitask Language Understanding (MMLU) and MT-Bench benchmarks.

Becoming Llama 3.1 405B on Simply Two H200 GPUs with INT4 AWQ

For builders with {hardware} useful resource constraints, the INT4 AWQ approach in TensorRT Mannequin Optimizer compresses the mannequin, permitting Llama 3.1 405B to suit on simply two H200 GPUs. This technique reduces the required reminiscence footprint considerably by compressing the weights right down to 4-bit integers whereas encoding activations utilizing FP16.

Tables 4 and 5 present the utmost throughput and minimal latency efficiency measurements, demonstrating that the INT4 AWQ technique offers comparable accuracy scores to the Llama 3.1 official FP8 recipe from Meta.




Most Throughput Efficiency – Output Tokens/Second2 NVIDIA H200 Tensor Core GPUs


Enter | Output Sequence Lengths
2,048 | 128
32,768 | 2,048
60,000 | 2,048


TensorRT Mannequin Optimizer INT4 AWQ
75.6
28.7
16.2

Desk 4. Most throughput efficiency of Llama 3.1 405B with NVIDIA inner measurements




Batch Measurement = 1 Efficiency – Output Tokens/Second2 NVIDIA H200 Tensor Core GPUs


Enter | Output Sequence Lengths
2,048 | 128
32,768 | 2,048
60,000 | 2,048


TensorRT Mannequin Optimizer INT4 AWQ
21.6
18.7
12.8

Desk 5. Minimal latency efficiency of Llama 3.1 405B with NVIDIA inner measurements

NVIDIA’s developments in TensorRT Mannequin Optimizer and TensorRT-LLM are paving the best way for enhanced efficiency and effectivity in operating giant language fashions like Llama 3.1 405B. These enhancements provide builders extra flexibility and cost-efficiency, whether or not they have intensive {hardware} assets or extra constrained environments.

Picture supply: Shutterstock



Source link

Tags: 405BEnhancesLlamaModelNVIDIAOptimizerPerformanceTensorRT
Previous Post

Proton Wallet Review: A Bitcoin Software Wallet That Simplifies Transactions

Next Post

El Salvador’s Bukele Says Bitcoin Strategy a ‘Net Positive,’ but Adoption Lags

Related Posts

XTZ Price Struggles at $0.83 Despite Strong Bullish Trend Classification
Blockchain

XTZ Price Struggles at $0.83 Despite Strong Bullish Trend Classification

31 July 2025
Dragonfly Capital Dodges DOJ Threat in Tornado Cash Trial
Blockchain

Dragonfly Capital Dodges DOJ Threat in Tornado Cash Trial

31 July 2025
Tezos (XTZ) Price Struggles at $0.82 After Recent Volatility Spike
Blockchain

Tezos (XTZ) Price Struggles at $0.82 After Recent Volatility Spike

30 July 2025
When Dalio speaks, markets listen – and he mentioned Bitcoin
Blockchain

When Dalio speaks, markets listen – and he mentioned Bitcoin

30 July 2025
DYDX Price Falls to $0.62 Despite MiCA Compliance Boost – Technical Analysis Shows Mixed Signals
Blockchain

DYDX Price Falls to $0.62 Despite MiCA Compliance Boost – Technical Analysis Shows Mixed Signals

29 July 2025
Ray Dalio Backs Gold and Bitcoin as US Debt Hits $36.7T
Blockchain

Ray Dalio Backs Gold and Bitcoin as US Debt Hits $36.7T

29 July 2025
Next Post
El Salvador’s Bukele Says Bitcoin Strategy a ‘Net Positive,’ but Adoption Lags

El Salvador's Bukele Says Bitcoin Strategy a 'Net Positive,' but Adoption Lags

Trump Keeps Teasing His New Crypto Project, but Details Remain Scant

Trump Keeps Teasing His New Crypto Project, but Details Remain Scant

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
FTT jumps 7% as Backpack launches platform to help FTX victims liquidate claims – CoinJournal

FTT jumps 7% as Backpack launches platform to help FTX victims liquidate claims – CoinJournal

19 July 2025
PENDLE token goes live on BeraChain and HyperEVM to expand cross-chain utility – CoinJournal

PENDLE token goes live on BeraChain and HyperEVM to expand cross-chain utility – CoinJournal

30 July 2025
BNB Price Gears Up for Upside Break — Will Bulls Deliver?

BNB Price Gears Up for Upside Break — Will Bulls Deliver?

8 July 2025
Something Big Is Coming For XRP On July 9—Why It Matters

Something Big Is Coming For XRP On July 9—Why It Matters

8 July 2025
XRP could rally higher on steady capital inflow; check forecast

XRP could rally higher on steady capital inflow; check forecast

8 July 2025
10 Most Popular Bitcoin Mining Apps for Android & iOS in 2025 | Earn Crypto Fast

10 Most Popular Bitcoin Mining Apps for Android & iOS in 2025 | Earn Crypto Fast

24 May 2025
Spot Solana ETF Race: 21Shares Updates Application After US SEC Requests Amendments | Bitcoinist.com

Spot Solana ETF Race: 21Shares Updates Application After US SEC Requests Amendments | Bitcoinist.com

31 July 2025
Bitcoin Whales Bought 1% of Circulating BTC Supply in Past 4 Months – Decrypt

Bitcoin Whales Bought 1% of Circulating BTC Supply in Past 4 Months – Decrypt

31 July 2025
JPMorgan & Coinbase Team Up: Crypto From Rewards Coming

JPMorgan & Coinbase Team Up: Crypto From Rewards Coming

31 July 2025
Altcoins Stay In Danger Zone Until Bitcoin Clears This Level: Analyst

Altcoins Stay In Danger Zone Until Bitcoin Clears This Level: Analyst

31 July 2025
Engineer Arrested After Hackers Loot $44M From CoinDCX

Engineer Arrested After Hackers Loot $44M From CoinDCX

31 July 2025
Auradine Shipped $73M Worth of Bitcoin Miners to MARA in H1 2025 – Mining Bitcoin News

Auradine Shipped $73M Worth of Bitcoin Miners to MARA in H1 2025 – Mining Bitcoin News

31 July 2025
Facebook Twitter Instagram Youtube RSS
Coin Digest Daily

Stay ahead in the world of cryptocurrencies with Coin Digest Daily. Your daily dose of insightful news, market trends, and expert analyses. Empowering you to make informed decisions in the ever-evolving blockchain space.

CATEGORIES

  • Altcoin
  • Analysis
  • Bitcoin
  • Blockchain
  • Crypto Exchanges
  • Crypto Updates
  • DeFi
  • Ethereum
  • Metaverse
  • NFT
  • Regulations
  • Scam Alert
  • Web3

SITEMAP

  • About us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2024 Coin Digest Daily.
Coin Digest Daily is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Bitcoin
  • Crypto Updates
    • General
    • Altcoin
    • Ethereum
    • Crypto Exchanges
  • Blockchain
  • NFT
  • Metaverse
  • Web3
  • DeFi
  • Analysis
  • Scam Alert
  • Regulations

Copyright © 2024 Coin Digest Daily.
Coin Digest Daily is not responsible for the content of external sites.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
  • bitcoinBitcoin(BTC)$118,260.000.24%
  • ethereumEthereum(ETH)$3,786.95-0.21%
  • rippleXRP(XRP)$3.10-1.33%
  • tetherTether(USDT)$1.00-0.02%
  • binancecoinBNB(BNB)$799.011.06%
  • solanaSolana(SOL)$177.09-1.08%
  • usd-coinUSDC(USDC)$1.000.00%
  • staked-etherLido Staked Ether(STETH)$3,782.63-0.29%
  • dogecoinDogecoin(DOGE)$0.217450-1.62%
  • tronTRON(TRX)$0.3285360.67%