Google Scholar

Nemotron-4 340B Technical Report

B Adler, N Agarwal, A Aithal, DH Anh… - arXiv preprint arXiv …, 2024 - arxiv.org

… In this section we report results for Nemotron-4-340B-Base. We compare our model against
other open access base foundation models like Llama-3 70B (MetaAI, 2024), Mistral 8x22 (…

Save Cite Cited by 8 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Compact language models via pruning and knowledge distillation

S Muralidharan, ST Sreenivas, R Joshi… - arXiv preprint arXiv …, 2024 - arxiv.org

… Finally, we apply our findings to prune the Nemotron-4 15B … instruction-tuning data used
for Nemotron-4 340B [38] to create … Phi-3 technical report: A highly capable language model …

Save Cite Cited by 2 Related articles All 3 versions View as HTML

NVIDIA Blackwell Platform: Advancing Generative AI and Accelerated Computing

A Tirumala, R Wong - 2024 IEEE Hot Chips 36 Symposium (HCS), 2024 - computer.org

… Nemotron-4 15B … Nemotron-4 340B … Nemotron-4 340B Technical Report https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf …

[PDF] arxiv.org

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

H Wang, W Xiong, T Xie, H Zhao, T Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

… Notably, the performance of our model surpasses the LLM-as-a-judge method with GPT-4
judges by a margin, and approaches the performance of the much larger Nemotron-4 340B …

Save Cite Cited by 12 Related articles View as HTML

[PDF] arxiv.org

Self-Directed Synthetic Dialogues and Revisions Technical Report

N Lambert, H Schoelkopf, A Gokaslan… - arXiv preprint arXiv …, 2024 - arxiv.org

… This technical report details the data collection process and properties of Self-Directed …
This could be fixed with the stronger open models available today in Nemotron 340B and …

[PDF] arxiv.org

LLM Pruning and Distillation in Practice: The Minitron Approach

ST Sreenivas, S Muralidharan, R Joshi… - arXiv preprint arXiv …, 2024 - arxiv.org

… According to the Llama 3.1 tech report [3], the 8B model is pretrained on 15T … -Aligner [11]
with the instruction tuning dataset used for Nemotron-4 340B [12]. As shown in Table 2, we …

Save Cite Related articles View as HTML

[PDF] arxiv.org

HelpSteer2: Open-source dataset for training top-performing reward models

Z Wang, Y Dong, O Delalleau, J Zeng, G Shen… - arXiv preprint arXiv …, 2024 - arxiv.org

… Scaling up the base model to Nemotron-4 340B with the same dataset results in the trained
reward model topping the Reward Bench primary leaderboard with an overall performance …

Save Cite Cited by 9 Related articles All 2 versions View as HTML

[PDF] arxiv.org

Ruri: Japanese General Text Embeddings

H Tsukagoshi, R Sasano - arXiv preprint arXiv:2409.07737, 2024 - arxiv.org

… To address this, we used the reward model of Nemotron-4 340B13 to score the generated
outputs and removed the bottom 20% of examples based on low helpfulness scores. …

[PDF] arxiv.org

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org

… This paper presents a new set of foundation models, called … This paper presents an extensive
empirical evaluation of Llama … The paper also presents the results of experiments in which …

Save Cite Cited by 161 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Data-Centric AI in the Age of Large Language Models

X Xu, Z Wu, R Qiao, A Verma, Y Shu, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

… This position paper proposes a data-centric viewpoint of AI research, focusing on large language
models (LLMs). We start by … 2023] and the recently released Nemotron-4 family [NVIDIA…

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Nemotron-4 340B Technical Report

Compact language models via pruning and knowledge distillation

NVIDIA Blackwell Platform: Advancing Generative AI and Accelerated Computing

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Self-Directed Synthetic Dialogues and Revisions Technical Report

LLM Pruning and Distillation in Practice: The Minitron Approach

HelpSteer2: Open-source dataset for training top-performing reward models

Ruri: Japanese General Text Embeddings

The llama 3 herd of models

Data-Centric AI in the Age of Large Language Models