1
00:00:00,000 --> 00:00:03,640
我很高兴与投资界分享这次独家专访
2
00:00:03,640 --> 00:00:06,320
大多数人认为英伟达是一家硬件公司
3
00:00:06,320 --> 00:00:08,900
专门生产训练大型AI模型的芯片
4
00:00:08,900 --> 00:00:10,599
但你即将了解一个内幕
5
00:00:10,599 --> 00:00:13,019
看看故事的另一面
6
00:00:13,019 --> 00:00:17,509
我有幸邀请到英伟达AI基础设施产品负责人乔·多尔
7
00:00:17,509 --> 00:00:21,548
过去四年里,乔一直在部署支撑
8
00:00:21,548 --> 00:00:24,260
地球上最强大的AI模型的硬件和软件
9
00:00:24,260 --> 00:00:28,420
他分享了关于AI未来发展的几个惊人见解
10
00:00:28,420 --> 00:00:31,719
但这只是我即将在直播中涵盖的众多技术之一
11
00:00:31,719 --> 00:00:33,899
几周后在GTC大会上
12
00:00:33,899 --> 00:00:36,920
GTC是英伟达的大型AI峰会
13
00:00:36,920 --> 00:00:40,869
展示机器人和自动驾驶领域的重大突破
14
00:00:40,869 --> 00:00:43,030
AI代理及其驱动芯片
15
00:00:43,030 --> 00:00:44,409
以及更多内容
16
00:00:44,409 --> 00:00:48,628
通过我的链接注册GTC免费在线会场
17
00:00:48,628 --> 00:00:52,048
即可赢取英伟达RTX 590显卡
18
00:00:52,048 --> 00:00:53,649
只需参加任意一场会
19
00:00:53,649 --> 00:00:56,990
截屏作为凭证并在会议后发送给我
20
00:00:56,990 --> 00:00:58,500
使用下方链接
21
00:00:58,500 --> 00:01:01,320
GTC应进入每位投资者的视野
22
00:01:01,320 --> 00:01:04,849
英伟达的AI推理生态系统同样值得关注
23
00:01:04,849 --> 00:01:05,888
你的时间很宝贵
24
00:01:05,888 --> 00:01:07,888
让我们直接进入正题
25
00:01:07,888 --> 00:01:09,528
能在这里与大家见面我非常开心
26
00:01:09,528 --> 00:01:10,429
感谢您的时间
27
00:01:10,429 --> 00:01:11,159
顺便说一句
28
00:01:11,159 --> 00:01:14,400
Jensen在主旨演讲中提到了许多精彩内容
29
00:01:14,400 --> 00:01:17,159
他详细谈到的其中一个重点是
30
00:01:17,159 --> 00:01:23,129
英伟达实际上为维拉·鲁宾世代共同设计了六款芯片
31
00:01:23,319 --> 00:01:24,439
这需要深入解析
32
00:01:24,439 --> 00:01:26,379
我希望能与您一起探讨所有内容
33
00:01:26,379 --> 00:01:31,549
从GPU本身开始,逐步深入到机架级系统层面
34
00:01:31,549 --> 00:01:32,328
如果可以的话
35
00:01:32,328 --> 00:01:35,370
那我们就从鲁宾芯片开始吧
36
00:01:35,370 --> 00:01:38,560
黑尔威尔和鲁宾有何区别
37
00:01:38,560 --> 00:01:39,099
哦
38
00:01:39,099 --> 00:01:42,719
鲁宾在多个方面与黑尔威尔不同
39
00:01:42,719 --> 00:01:47,530
我们共同设计了六款芯片
40
00:01:47,530 --> 00:01:50,250
我们分析了数据中心需求
41
00:01:50,250 --> 00:01:52,209
然后反向推导
42
00:01:52,209 --> 00:01:56,349
在六款芯片中需要哪些配置才能实现最佳性能
43
00:01:56,349 --> 00:01:57,930
最佳能效
44
00:01:57,930 --> 00:01:59,060
最低成本
45
00:01:59,060 --> 00:01:59,579
没错
46
00:01:59,579 --> 00:02:03,260
这就是鲁宾的核心特点
47
00:02:03,260 --> 00:02:04,760
这种极致协同设计
48
00:02:04,760 --> 00:02:07,700
所有芯片共同制造
49
00:02:07,700 --> 00:02:08,949
共同设计
50
00:02:08,949 --> 00:02:11,788
协同工作以实现最佳性能
51
00:02:11,788 --> 00:02:17,039
当你提到查看数据中心需求时,这些是否由今天的AI模型驱动
52
00:02:17,039 --> 00:02:18,699
或者是什么在绝对驱动
53
00:02:18,699 --> 00:02:23,639
模型确实是推动计算需求的主要因素,尤其是Moe模型
54
00:02:23,639 --> 00:02:29,079
特别是混合专家模型,它们需要生成大量许多token
55
00:02:29,079 --> 00:02:29,639
因素
56
00:02:29,639 --> 00:02:32,938
更多token源于其推理过程
57
00:02:32,938 --> 00:02:35,780
同时模型规模也在持续扩大
58
00:02:35,780 --> 00:02:38,430
模型规模越大,智能水平越高
59
00:02:38,430 --> 00:02:39,830
来自推理能力
60
00:02:39,830 --> 00:02:44,330
这正在产生巨大的计算需求
61
00:02:44,330 --> 00:02:47,919
而Rubin正是为此设计的,明白
62
00:02:47,919 --> 00:02:51,400
请谈谈Blackwell与Rubin的区别
63
00:02:51,400 --> 00:02:55,050
具体到GPU的功耗和性能
64
00:02:55,050 --> 00:02:59,210
在推理工作负载的功耗和性能方面
65
00:02:59,210 --> 00:03:04,569
Rubin相比Blackwell性能提升高达十倍
66
00:03:04,569 --> 00:03:05,520
哇
67
00:03:05,520 --> 00:03:07,340
每瓦特性能提升十倍
68
00:03:07,340 --> 00:03:10,919
这意味着在固定延迟下
69
00:03:10,919 --> 00:03:11,659
可以看到
70
00:03:11,659 --> 00:03:15,210
通过我们在詹森演讲中展示的帕累托图表
71
00:03:15,210 --> 00:03:16,590
在特定延迟水平下
72
00:03:16,590 --> 00:03:17,090
非常
73
00:03:17,090 --> 00:03:17,590
你知道的
74
00:03:17,590 --> 00:03:19,069
高延迟但
75
00:03:19,069 --> 00:03:19,629
Uh
76
00:03:19,629 --> 00:03:21,649
这对模型用户很有利
77
00:03:21,649 --> 00:03:22,169
所以是的
78
00:03:22,169 --> 00:03:24,568
十倍性能提升覆盖整个机架规模
79
00:03:24,568 --> 00:03:26,609
是否在机架级架构中
80
00:03:26,609 --> 00:03:30,430
这里展示的是Blackwall Ultra代计算单元
81
00:03:30,430 --> 00:03:35,650
我可以展示组件及其分解情况
82
00:03:35,650 --> 00:03:37,689
我们有两个超级芯片
83
00:03:37,689 --> 00:03:38,770
两个超级芯片
84
00:03:38,770 --> 00:03:38,969
好的
85
00:03:38,969 --> 00:03:42,769
每个超级芯片包含两个Blackwall Alter GPU
86
00:03:42,769 --> 00:03:47,008
并在其中一个超级芯片上配备一个CPU
87
00:03:47,008 --> 00:03:48,188
然后两个超级芯片组合在一起
88
00:03:48,188 --> 00:03:51,110
形成四GPU加CPU配置
89
00:03:51,110 --> 00:03:56,899
同时还有Connect X8超级节点作为组件
90
00:03:56,899 --> 00:04:00,338
这将在后续讨论Vera Rubin时成为关键区别
91
00:04:00,338 --> 00:04:01,829
这些组件如何迁移
92
00:04:01,829 --> 00:04:03,528
嗯
93
00:04:03,528 --> 00:04:05,929
可以看到这是混合冷却系统
94
00:04:05,929 --> 00:04:06,348
好的
95
00:04:06,348 --> 00:04:07,528
这些是冷板
96
00:04:07,528 --> 00:04:10,569
对超级芯片及其组件进行液冷
97
00:04:10,569 --> 00:04:14,699
而在机架前半部分
98
00:04:14,699 --> 00:04:16,699
应说明这是风冷设计
99
00:04:16,699 --> 00:04:18,620
这些是我实际观察到的部分
100
00:04:18,620 --> 00:04:20,480
所有风扇顶部
101
00:04:20,480 --> 00:04:23,040
这里有八个风扇得到了它,所以八个风扇
102
00:04:23,040 --> 00:04:24,860
然后我们有一个蓝色区域
103
00:04:24,860 --> 00:04:25,800
DPU
104
00:04:25,800 --> 00:04:26,300
呃
105
00:04:26,300 --> 00:04:27,860
这也是这个托盘的一部分
106
00:04:27,860 --> 00:04:30,199
这是南北方向的交通
107
00:04:30,199 --> 00:04:30,879
呃
108
00:04:30,879 --> 00:04:31,860
连接存储设备
109
00:04:31,860 --> 00:04:36,310
将数据导入计算机架
110
00:04:36,310 --> 00:04:38,209
因此它会为
111
00:04:38,209 --> 00:04:40,110
GPU得到了它,所以是的
112
00:04:40,110 --> 00:04:42,120
DPU负责数据的输入输出
113
00:04:42,120 --> 00:04:43,360
然后所有处理过程
114
00:04:43,360 --> 00:04:44,759
所有神奇的操作都在
115
00:04:44,759 --> 00:04:46,790
超级芯片本身得到了它
116
00:04:46,790 --> 00:04:48,470
因此有两种网络流量
117
00:04:48,470 --> 00:04:51,329
南北方向是在同一机架内
118
00:04:51,329 --> 00:04:53,930
东西方向是连接多个机架
119
00:04:53,930 --> 00:04:55,290
这就是我们应该这样理解的吗
120
00:04:55,290 --> 00:04:56,730
这才是正确的理解方式
121
00:04:56,730 --> 00:04:57,110
是的
122
00:04:57,110 --> 00:04:57,490
是的
123
00:04:57,490 --> 00:05:01,360
嗯,我以为NVIDIA只是GPU设计商
124
00:05:01,360 --> 00:05:03,120
但Grace其实是CPU对吧
125
00:05:03,120 --> 00:05:05,519
那么CPU的作用是什么
126
00:05:05,519 --> 00:05:07,600
CPU处理大量管理任务
127
00:05:07,600 --> 00:05:08,180
所以
128
00:05:08,180 --> 00:05:08,920
例如
129
00:05:08,920 --> 00:05:10,209
当你进行
130
00:05:10,209 --> 00:05:10,889
呃
131
00:05:10,889 --> 00:05:12,410
你正在使用推理
132
00:05:12,410 --> 00:05:14,910
并希望你的模型能够
133
00:05:14,910 --> 00:05:15,250
呃
134
00:05:15,250 --> 00:05:16,298
为你生成一些代码
135
00:05:16,298 --> 00:05:17,519
并且想要制作
136
00:05:17,519 --> 00:05:18,678
可能混合一点应用
137
00:05:18,678 --> 00:05:23,589
一个需要运行的Python应用,CPU可以执行该应用
138
00:05:23,589 --> 00:05:27,290
GPU无法运行由模型生成的应用
139
00:05:28,370 --> 00:05:31,029
但它也处理其他类型的任务
140
00:05:31,029 --> 00:05:35,819
比如数据库分析等更适合CPU的功能
141
00:05:35,819 --> 00:05:38,319
能够加速这类任务
142
00:05:38,319 --> 00:05:39,079
哦,所以真的
143
00:05:39,079 --> 00:05:40,420
整个理念类似于
144
00:05:40,420 --> 00:05:41,579
让GPU负责
145
00:05:41,579 --> 00:05:42,480
它们最擅长的工作
146
00:05:42,480 --> 00:05:44,279
然后CPU处理其他事务
147
00:05:44,279 --> 00:05:47,519
显然CPU在某些方面远优于GPU
148
00:05:47,519 --> 00:05:51,680
因此可以将任务分配到适合的芯片上
149
00:05:51,680 --> 00:05:51,920
没错
150
00:05:51,920 --> 00:05:52,420
正确
151
00:05:52,420 --> 00:05:54,850
你还提到了一种叫做DPU的东西
152
00:05:54,850 --> 00:05:55,810
你能给我们详细讲解一下吗
153
00:05:55,810 --> 00:05:56,910
DSO DPU
154
00:05:56,910 --> 00:06:01,639
Bluefield DPU数据处理单元将处理南北向流量
155
00:06:01,639 --> 00:06:02,720
南北向流量
156
00:06:02,720 --> 00:06:03,019
是的
157
00:06:03,019 --> 00:06:04,839
当你连接到存储设备时
158
00:06:04,839 --> 00:06:06,639
它位于不同的机架上
159
00:06:06,649 --> 00:06:10,178
将会进行压缩和加密
160
00:06:10,658 --> 00:06:14,259
所有这些都将由Bluefield中的GPU管理
161
00:06:14,259 --> 00:06:14,999
Bluefield三号
162
00:06:14,999 --> 00:06:16,978
这个目标就是确保
163
00:06:16,978 --> 00:06:20,069
CPU和GPU不会执行这些任务
164
00:06:20,069 --> 00:06:20,810
卸载任务
165
00:06:20,810 --> 00:06:23,750
将所有功能从CPU和GPU卸载
166
00:06:23,750 --> 00:06:26,120
在硬件中加速这些功能
167
00:06:26,120 --> 00:06:26,639
嗯
168
00:06:26,639 --> 00:06:32,100
这样就能获得最快的数 据访问速度,为GPU提供数据,这很有道理
169
00:06:32,100 --> 00:06:32,399
明白了
170
00:06:32,399 --> 00:06:35,160
所以目前这三块是六颗芯片中的三颗
171
00:06:35,160 --> 00:06:35,899
CPU
172
00:06:35,899 --> 00:06:39,189
GPU和DPU以及Connex
173
00:06:39,189 --> 00:06:39,410
对的
174
00:06:39,410 --> 00:06:41,269
多说一点关于Connect X8
175
00:06:41,269 --> 00:06:43,750
这是东西向连接
176
00:06:43,750 --> 00:06:46,798
这是连接东西向的超级接口
177
00:06:46,798 --> 00:06:52,129
它还具备内联加密等功能处理东西向流量
178
00:06:52,129 --> 00:06:55,769
将连接机架间的GPU
179
00:06:55,769 --> 00:06:56,670
明白了
180
00:06:56,670 --> 00:06:58,490
所以我们有GPU
181
00:06:58,490 --> 00:06:59,509
CPU
182
00:06:59,509 --> 00:07:03,149
DPU和Connex芯片在这块板上
183
00:07:03,149 --> 00:07:04,430
另外两颗芯片在哪里
184
00:07:04,430 --> 00:07:08,259
EmuLink交换机是另一颗芯片
185
00:07:08,259 --> 00:07:09,019
嗯
186
00:07:09,019 --> 00:07:12,560
这个交换机托盘上有两颗
187
00:07:12,560 --> 00:07:13,220
嗯
188
00:07:13,220 --> 00:07:16,548
这是第五代NBLINK
189
00:07:16,548 --> 00:07:19,809
这些正在与NBLINK网络通信
190
00:07:19,809 --> 00:07:21,928
每秒1800吉比特
191
00:07:21,928 --> 00:07:23,108
1800一点
192
00:07:23,108 --> 00:07:24,699
每秒1.8太比特
193
00:07:24,699 --> 00:07:27,000
速度非常快
194
00:07:27,000 --> 00:07:34,189
这将成为Blackwell的核心神经系统
195
00:07:34,189 --> 00:07:35,470
GB300 NBL
196
00:07:35,470 --> 00:07:36,629
72明白了
197
00:07:36,629 --> 00:07:37,009
所以
198
00:07:37,009 --> 00:07:39,110
所以这是两个完全不同的托盘
199
00:07:39,110 --> 00:07:39,329
对的
200
00:07:39,329 --> 00:07:41,100
这就是计算托盘
201
00:07:41,100 --> 00:07:43,540
这就是处理数据时发生魔法的地方
202
00:07:43,540 --> 00:07:45,418
然后这是开关托盘
203
00:07:45,418 --> 00:07:50,639
我认为你之前提到的正是将所有GPU连接起来
204
00:07:50,639 --> 00:07:52,788
它将所有GPU连接在一起
205
00:07:52,788 --> 00:07:55,028
伊拉克内部有多个这样的交易
206
00:07:55,028 --> 00:07:55,408
是的
207
00:07:55,408 --> 00:07:55,968
呃
208
00:07:55,968 --> 00:07:58,088
所有GPU共有72台
209
00:07:58,088 --> 00:08:00,569
他们在伊拉克有72台
210
00:08:00,569 --> 00:08:02,490
实现全互联连接
211
00:08:02,490 --> 00:08:07,339
每个GPU必须能以全带宽与其他所有GPU通信
212
00:08:07,339 --> 00:08:10,439
这就是交换机实现的功能,所以
213
00:08:10,439 --> 00:08:11,439
每秒8太字节
214
00:08:11,439 --> 00:08:13,480
任何GPU都能与任何其他GPU通信
215
00:08:13,480 --> 00:08:15,660
这就是为什么叫计算织构的原因吗
216
00:08:15,660 --> 00:08:19,209
当我画这个网络图时应该没问题
217
00:08:19,209 --> 00:08:19,550
明白了
218
00:08:19,550 --> 00:08:20,589
所以是的
219
00:08:20,589 --> 00:08:21,970
他们称之为计算织构
220
00:08:21,970 --> 00:08:25,790
不仅因为连接所有GPU
221
00:08:25,790 --> 00:08:30,189
我们的NVLink交换芯片还包含一些计算功能
222
00:08:30,189 --> 00:08:33,710
我们称之为全归约或集体操作
223
00:08:33,710 --> 00:08:39,019
在训练时需要跨网络共享某些操作
224
00:08:39,019 --> 00:08:40,860
无需发送到所有GPU
225
00:08:40,860 --> 00:08:43,220
这些操作会在交换机内完成
226
00:08:43,220 --> 00:08:44,080
哦太棒了
227
00:08:44,080 --> 00:08:44,379
好的
228
00:08:44,379 --> 00:08:46,240
这个交换机不仅仅是连接设备
229
00:08:46,240 --> 00:08:49,659
它实际上也在执行部分计算
230
00:08:49,659 --> 00:08:51,318
这太厉害了
231
00:08:51,318 --> 00:08:51,958
好的
232
00:08:51,958 --> 00:08:54,278
我觉得我们已经介绍了五款芯片
233
00:08:54,278 --> 00:08:54,479
没错
234
00:08:54,479 --> 00:08:55,178
这样对吗
235
00:08:55,178 --> 00:08:56,039
完全正确
236
00:08:56,039 --> 00:08:57,720
第六款芯片是什么
237
00:08:57,720 --> 00:08:59,480
第六款是Spectrum Max
238
00:09:00,759 --> 00:09:02,240
我们可以看看这些机架吗
239
00:09:02,240 --> 00:09:03,068
对的
240
00:09:03,068 --> 00:09:04,769
我们去看看吧
241
00:09:05,879 --> 00:09:08,039
顶部有十个托盘
242
00:09:08,039 --> 00:09:09,399
这些是计算托盘
243
00:09:09,399 --> 00:09:11,110
九个网络托盘
244
00:09:11,110 --> 00:09:13,149
九个NVLink交换托盘
245
00:09:13,149 --> 00:09:13,610
应该说
246
00:09:13,610 --> 00:09:19,149
它们的任务是连接上方十台和下方八台GPU
247
00:09:19,549 --> 00:09:21,570
将计算托盘整合在一起对吧
248
00:09:21,570 --> 00:09:23,330
上面有什么
249
00:09:23,330 --> 00:09:26,059
这就是顶部机架
250
00:09:26,059 --> 00:09:29,340
这里有一台千兆位交换机用于遥测
251
00:09:29,340 --> 00:09:31,059
这只是某件事
252
00:09:31,059 --> 00:09:33,419
这是系统管理功能
253
00:09:33,419 --> 00:09:34,179
速度很低
254
00:09:34,179 --> 00:09:34,779
以太网
255
00:09:34,779 --> 00:09:36,139
这只是一个
256
00:09:36,139 --> 00:09:38,779
这只是用于自身管理的系统
257
00:09:38,779 --> 00:09:39,100
它并不
258
00:09:39,100 --> 00:09:42,299
它没有处理AI计算数据
259
00:09:42,299 --> 00:09:42,879
它在进行管理
260
00:09:42,879 --> 00:09:44,320
如果GPU宕机
261
00:09:44,320 --> 00:09:44,960
就像
262
00:09:44,960 --> 00:09:46,559
帮我理解遥测是什么意思
263
00:09:46,559 --> 00:09:47,960
以及这些遥测数据代表什么
264
00:09:47,960 --> 00:09:49,220
我只是在查看
265
00:09:49,220 --> 00:09:50,899
机架本身的各项功能
266
00:09:50,899 --> 00:09:52,000
我在查看
267
00:09:52,000 --> 00:09:52,940
运行时间
268
00:09:52,940 --> 00:09:54,360
我在查看健康状态和状态
269
00:09:54,360 --> 00:09:55,220
大概就是健康状态
270
00:09:55,220 --> 00:09:56,139
检查确认
271
00:09:56,139 --> 00:09:58,159
进行全面诊断
272
00:09:58,159 --> 00:10:02,629
你提到还有另一种机架会并排放置
273
00:10:02,629 --> 00:10:03,549
所以是的
274
00:10:03,549 --> 00:10:06,350
你会有一组计算机架
275
00:10:06,350 --> 00:10:08,049
三百台计算机架
276
00:10:08,049 --> 00:10:11,250
然后还会配备专用的光谱机架
277
00:10:11,250 --> 00:10:13,730
最大东西向网络交换机
278
00:10:13,730 --> 00:10:15,149
不过我们没有
279
00:10:15,149 --> 00:10:16,600
这里但呃
280
00:10:16,600 --> 00:10:18,559
不过功能应该是这样的
281
00:10:18,559 --> 00:10:19,740
我们称之为一个机柜
282
00:10:19,740 --> 00:10:20,659
你拥有
283
00:10:20,659 --> 00:10:23,860
可能八台三百机架
284
00:10:23,860 --> 00:10:27,919
然后会有几台配备光谱Max的交换机机架
285
00:10:27,919 --> 00:10:28,539
是的
286
00:10:28,539 --> 00:10:32,700
这很好地概述了当前的Blackwall系统
287
00:10:32,700 --> 00:10:34,359
我想了解如何
288
00:10:34,359 --> 00:10:37,639
从Blackwell到Reuben的变化
289
00:10:37,639 --> 00:10:38,078
明白了
290
00:10:38,078 --> 00:10:39,558
我们可以过去看看
291
00:10:39,558 --> 00:10:42,158
我们去看看这些托盘
292
00:10:42,419 --> 00:10:45,679
现在正在查看墙上的组件
293
00:10:45,679 --> 00:10:47,700
之前在计算托盘中讨论过
294
00:10:47,700 --> 00:10:48,879
Bluefield DPU
295
00:10:48,879 --> 00:10:49,879
Bluefield核心
296
00:10:49,879 --> 00:10:50,120
是的
297
00:10:50,120 --> 00:10:51,460
所以你可以看到它
298
00:10:51,460 --> 00:10:52,500
在墙上的位置
299
00:10:52,500 --> 00:10:55,759
这块板是模块系统的一部分
300
00:10:55,759 --> 00:10:59,090
可以插入或取出计算托盘以方便维护
301
00:10:59,090 --> 00:10:59,570
呃
302
00:10:59,570 --> 00:11:00,629
然后所有类似的情况
303
00:11:00,629 --> 00:11:03,149
连接X9位于中间
304
00:11:03,149 --> 00:11:03,690
呃
305
00:11:03,690 --> 00:11:06,950
这块板上有两个连接X9
306
00:11:06,950 --> 00:11:10,710
每个计算托盘总计八个
307
00:11:10,710 --> 00:11:13,750
因此每个GPU都会分配一个
308
00:11:13,750 --> 00:11:16,149
KINEX9的传输速率达6太比特每秒
309
00:11:16,149 --> 00:11:22,950
然后我们有光子集成封装光学模块
310
00:11:22,950 --> 00:11:24,610
这真是太酷了
311
00:11:24,610 --> 00:11:24,990
是啊
312
00:11:24,990 --> 00:11:31,220
这是什么?与其使用可插拔的光学模块
313
00:11:31,220 --> 00:11:34,000
它们直接集成在芯片上
314
00:11:34,000 --> 00:11:35,500
与芯片共封装
315
00:11:35,500 --> 00:11:38,328
这在能效方面有巨大提升
316
00:11:38,328 --> 00:11:43,428
可靠性也显著提高,主要体现在这两个方面
317
00:11:43,428 --> 00:11:46,629
以前我们使用光纤收发器
318
00:11:46,629 --> 00:11:48,789
光纤光学收发器
319
00:11:48,789 --> 00:11:51,259
两端连接光纤电缆
320
00:11:51,259 --> 00:11:53,899
这些收发器内置激光器
321
00:11:53,899 --> 00:11:54,379
没错
322
00:11:54,379 --> 00:11:56,139
需要供电对吧
323
00:11:56,139 --> 00:11:58,059
这就是要消除的部分
324
00:11:58,059 --> 00:12:01,320
通过将封装与芯片集成
325
00:12:01,320 --> 00:12:04,629
这对性能或功耗有何实际影响
326
00:12:04,629 --> 00:12:06,769
从性能角度来看
327
00:12:06,769 --> 00:12:08,049
性能保持不变
328
00:12:08,049 --> 00:12:08,450
是的
329
00:12:08,450 --> 00:12:09,809
但会带来
330
00:12:09,809 --> 00:12:13,850
功耗降低和可靠性提升
331
00:12:13,850 --> 00:12:17,210
因为可插拔激光器可能
332
00:12:17,210 --> 00:12:17,789
你知道的
333
00:12:17,789 --> 00:12:19,070
有时非常不可靠
334
00:12:19,070 --> 00:12:21,470
需要频繁更换
335
00:12:21,470 --> 00:12:23,269
但如果共封装在这里
336
00:12:23,269 --> 00:12:24,009
在
337
00:12:24,009 --> 00:12:24,919
在芯片上
338
00:12:24,919 --> 00:12:26,220
可靠性大幅提升
339
00:12:26,220 --> 00:12:26,759
比如
340
00:12:26,759 --> 00:12:28,720
可能提升十倍左右
341
00:12:28,720 --> 00:12:28,960
太棒了
342
00:12:28,960 --> 00:12:29,980
差异巨大
343
00:12:29,980 --> 00:12:32,429
这些组件在机架中的位置
344
00:12:32,429 --> 00:12:37,089
位于独立的交换托盘或交换服务器
345
00:12:37,089 --> 00:12:38,808
属于单独机架
346
00:12:38,808 --> 00:12:40,808
这就是独立机架
347
00:12:40,808 --> 00:12:43,340
与72系列分离
348
00:12:43,340 --> 00:12:46,059
这就是东西向流量交换机架
349
00:12:47,059 --> 00:12:47,580
太厉害了
350
00:12:47,580 --> 00:12:48,929
量子MAX
351
00:12:48,929 --> 00:12:49,330
嗯
352
00:12:49,330 --> 00:12:51,409
还有用于finband的组件
353
00:12:51,409 --> 00:12:54,009
这是以太网的替代方案
354
00:12:54,009 --> 00:12:57,610
还有量子infiniband的co包光模块
355
00:12:57,610 --> 00:12:59,090
所以这两款芯片功能等效
356
00:12:59,090 --> 00:13:00,690
一款用于spectrum x以太网
357
00:13:00,690 --> 00:13:02,070
另一款用于量子infiniband
358
00:13:02,070 --> 00:13:02,818
正确
359
00:13:02,818 --> 00:13:06,599
然后还有spectrum x以太网光子交换机
360
00:13:06,599 --> 00:13:10,629
因此该co包光模块芯片已集成其中
361
00:13:10,629 --> 00:13:12,730
在以太网光子交换机内
362
00:13:12,730 --> 00:13:14,070
这就是光子部分的组件
363
00:13:14,070 --> 00:13:16,250
co包光模块明白了
364
00:13:16,250 --> 00:13:18,470
但这些组件安装在侧挂单元
365
00:13:18,470 --> 00:13:19,549
这些组件安装到
366
00:13:19,549 --> 00:13:19,870
嗯
367
00:13:19,870 --> 00:13:20,570
交换机机架
368
00:13:20,570 --> 00:13:21,210
是的
369
00:13:21,210 --> 00:13:24,149
也理解了那个部分
370
00:13:24,149 --> 00:13:28,379
如果采用量子infiniband作为东西向流量协议
371
00:13:28,379 --> 00:13:31,340
则需使用infiniband作为侧挂单元
372
00:13:31,340 --> 00:13:32,460
所以这些是
373
00:13:32,460 --> 00:13:33,889
这些是等效方案
374
00:13:33,889 --> 00:13:35,110
一款finiband
375
00:13:35,110 --> 00:13:36,529
一款用于以太网
376
00:13:36,529 --> 00:13:37,570
正确明白了
377
00:13:37,570 --> 00:13:38,049
是的
378
00:13:38,049 --> 00:13:38,909
没错
379
00:13:38,909 --> 00:13:40,389
所以我们刚才讨论的内容
380
00:13:40,389 --> 00:13:43,960
可以说是当前数据中心的顶尖技术
381
00:13:43,960 --> 00:13:44,299
没错
382
00:13:44,299 --> 00:13:49,320
Blackwell Ultra目前是数据中心的顶尖产品
383
00:13:49,320 --> 00:13:51,820
然后Jensen推出了Vera Rubin
384
00:13:51,820 --> 00:13:53,399
我们提到的六款芯片
385
00:13:53,399 --> 00:13:55,340
我们讨论了Blackwell系列
386
00:13:55,340 --> 00:13:58,779
这是一个与之前完全不同的计算模块
387
00:13:58,779 --> 00:14:00,840
能否详细说明差异
388
00:14:00,840 --> 00:14:01,440
哦对
389
00:14:01,440 --> 00:14:01,980
差异很多
390
00:14:01,980 --> 00:14:03,120
所以嗯
391
00:14:03,120 --> 00:14:05,179
我们整体设计
392
00:14:05,179 --> 00:14:06,639
采用模块化设计
393
00:14:06,639 --> 00:14:07,139
明白了
394
00:14:07,139 --> 00:14:09,700
这意味着这里有插槽
395
00:14:09,700 --> 00:14:12,720
这些组件可轻松滑入滑出
396
00:14:12,720 --> 00:14:14,309
并锁定到位
397
00:14:14,309 --> 00:14:17,350
无需大量线缆连接
398
00:14:17,350 --> 00:14:21,840
处理模块间的所有连接
399
00:14:21,840 --> 00:14:25,000
同时线缆布局也已优化
400
00:14:25,000 --> 00:14:25,429
是的
401
00:14:25,429 --> 00:14:27,789
所以中间有一个管道装置
402
00:14:27,789 --> 00:14:31,590
它负责管理大量液体的分配
403
00:14:31,590 --> 00:14:34,070
总体来看在GB上
404
00:14:34,070 --> 00:14:34,470
三百
405
00:14:34,470 --> 00:14:36,070
有四十三根软管
406
00:14:36,070 --> 00:14:38,350
这里有一个风扇机柜
407
00:14:38,350 --> 00:14:39,960
因为它是混合冷却系统
408
00:14:39,960 --> 00:14:40,559
呃
409
00:14:40,559 --> 00:14:43,549
GB三百的下半部分是风扇冷却
410
00:14:43,549 --> 00:14:44,149
这个
411
00:14:44,149 --> 00:14:45,389
我们已经去除了它
412
00:14:45,389 --> 00:14:45,830
呃
413
00:14:45,830 --> 00:14:48,070
现在我们完全采用液体冷却
414
00:14:48,070 --> 00:14:50,850
八个风扇变为零个风扇
415
00:14:50,850 --> 00:14:52,190
零根软管
416
00:14:52,190 --> 00:14:55,429
同时还移除了大量电缆
417
00:14:55,429 --> 00:14:57,049
所以实现无缆化
418
00:14:57,049 --> 00:15:01,090
我正在努力拼凑
419
00:15:01,090 --> 00:15:01,769
我正在观察的内容
420
00:15:01,769 --> 00:15:03,970
这些位置应该是两个超级芯片所在
421
00:15:03,970 --> 00:15:05,250
这些就是超级芯片
422
00:15:05,250 --> 00:15:06,269
可以滑入滑出
423
00:15:06,269 --> 00:15:07,318
自动锁定到位
424
00:15:07,318 --> 00:15:09,658
所以这里有两块Ruben芯片
425
00:15:09,658 --> 00:15:11,879
上面安装了一个Vera
426
00:15:11,879 --> 00:15:12,918
所以啊
427
00:15:12,918 --> 00:15:16,429
另一个重要点是现在模块化设计
428
00:15:16,429 --> 00:15:18,549
所有机柜可滑入滑出
429
00:15:22,100 --> 00:15:24,299
组装这个结构并进行
430
00:15:24,299 --> 00:15:27,000
效率提升二十倍
431
00:15:27,000 --> 00:15:30,629
原本组装GB三百机柜需要两小时
432
00:15:30,629 --> 00:15:32,190
现在只需五分钟
433
00:15:32,190 --> 00:15:32,629
是的
434
00:15:32,629 --> 00:15:34,110
在这条特定路线
435
00:15:34,110 --> 00:15:35,830
这就是组装过程
436
00:15:35,830 --> 00:15:35,950
比如
437
00:15:35,950 --> 00:15:37,190
如果出现维护问题
438
00:15:37,190 --> 00:15:38,549
同时也便于维护
439
00:15:38,549 --> 00:15:38,809
没错
440
00:15:38,809 --> 00:15:41,750
操作速度大幅提升
441
00:15:41,750 --> 00:15:43,730
可维护性显著增强
442
00:15:43,730 --> 00:15:45,440
提升倍数相当可观
443
00:15:45,440 --> 00:15:45,840
不
444
00:15:45,840 --> 00:15:46,600
这完全合理
445
00:15:46,600 --> 00:15:48,340
如果不用这些线缆和软管
446
00:15:48,340 --> 00:15:50,129
只需快速拆装
447
00:15:50,129 --> 00:15:51,129
修复问题
448
00:15:51,129 --> 00:15:52,090
解决任何故障
449
00:15:52,090 --> 00:15:53,110
快速装回即可
450
00:15:53,110 --> 00:15:54,830
这种模块化设计就像这样
451
00:15:54,830 --> 00:15:56,830
我们稍后会讨论下面的其他组件
452
00:15:56,830 --> 00:15:59,788
所以两个超级芯片鲁本·维拉
453
00:15:59,788 --> 00:16:00,328
呃
454
00:16:00,328 --> 00:16:03,208
我们还有CX9连接X9
455
00:16:03,208 --> 00:16:08,620
这是该超级尼克的下一代产品,位于这些电路板和模块上
456
00:16:08,620 --> 00:16:13,480
以前它们连接在GB300超级芯片底部
457
00:16:13,480 --> 00:16:16,389
但现在它们有自己的模块,卡可以插拔
458
00:16:16,389 --> 00:16:19,669
现在可以单独维护不同组件
459
00:16:19,669 --> 00:16:20,578
是的
460
00:16:20,578 --> 00:16:25,298
新一代DPU的BlueField也是这里的模块
461
00:16:25,298 --> 00:16:26,418
可以插拔使用
462
00:16:26,418 --> 00:16:30,399
明白了,这不仅仅是关于性能
463
00:16:30,399 --> 00:16:32,980
还关乎更高的可用性对吧
464
00:16:32,980 --> 00:16:35,080
所以这是另一个乘数
465
00:16:35,080 --> 00:16:40,279
整个AI工厂的输出能力取决于可用时间我们称之为有效产出
466
00:16:40,279 --> 00:16:41,470
你想要的
467
00:16:41,470 --> 00:16:41,970
那个
468
00:16:41,970 --> 00:16:44,350
实际生成令牌的时间占比
469
00:16:44,350 --> 00:16:45,409
要最大化这个
470
00:16:45,409 --> 00:16:45,990
是的
471
00:16:45,990 --> 00:16:46,950
这很有道理
472
00:16:46,950 --> 00:16:48,009
好的
473
00:16:48,009 --> 00:16:50,549
这就是等效计算托盘
474
00:16:50,549 --> 00:16:53,389
还有一个等效交换机托盘
475
00:16:53,389 --> 00:16:53,769
对的
476
00:16:53,769 --> 00:16:54,629
没错
477
00:16:54,629 --> 00:16:56,309
看起来更加简洁流畅
478
00:16:56,309 --> 00:16:57,980
请带我看看这里的改动
479
00:16:57,980 --> 00:16:59,840
从改动来看
480
00:16:59,840 --> 00:17:00,419
嗯
481
00:17:00,419 --> 00:17:02,700
顶部有这些交换机
482
00:17:02,700 --> 00:17:04,150
百分之百液冷设计
483
00:17:04,150 --> 00:17:04,630
嗯
484
00:17:04,630 --> 00:17:05,710
这里有四个交换芯片
485
00:17:05,710 --> 00:17:06,789
这是Envy Link
486
00:17:06,789 --> 00:17:09,378
第六代重型架构
487
00:17:09,378 --> 00:17:11,898
速度是黑墙版本的两倍
488
00:17:11,898 --> 00:17:12,638
速度翻倍
489
00:17:12,638 --> 00:17:15,489
现在达到36太字节每秒
490
00:17:15,489 --> 00:17:18,869
这将显著提升我们的性能
491
00:17:18,869 --> 00:17:23,230
我之前提到的每瓦或每兆瓦、每吉瓦的十倍性能
492
00:17:23,230 --> 00:17:24,470
根据你的需求
493
00:17:25,549 --> 00:17:31,089
NBL链路速度提升是其中一部分贡献
494
00:17:31,089 --> 00:17:35,019
加上其他GPU特性我们稍后可以讨论
495
00:17:35,019 --> 00:17:35,660
还有其他
496
00:17:35,660 --> 00:17:40,210
所以黑墙机架的总GPU数量是否相同
497
00:17:40,210 --> 00:17:41,490
与鲁本机架相比
498
00:17:41,490 --> 00:17:43,789
是的,采用NBL72
499
00:17:43,789 --> 00:17:46,259
72代表GPU数量
500
00:17:46,259 --> 00:17:47,059
所以gb
501
00:17:47,059 --> 00:17:48,660
三百七十二nbl
502
00:17:48,660 --> 00:17:50,259
嗯,现在我们有薇拉·鲁宾
503
00:17:50,259 --> 00:17:52,440
NBL七十二同显卡数量
504
00:17:52,440 --> 00:17:53,960
这也使其能够
505
00:17:53,960 --> 00:17:57,400
因此对于客户来说,从一个平台迁移到另一个平台非常兼容
506
00:17:57,400 --> 00:17:57,920
嗯
507
00:17:57,920 --> 00:18:01,529
这也是保持相同显卡数量的目标之一
508
00:18:01,529 --> 00:18:03,730
相同的Mgx Rap架构
509
00:18:04,930 --> 00:18:07,730
这能让客户更加便捷
510
00:18:07,730 --> 00:18:09,490
生态系统是
511
00:18:09,490 --> 00:18:10,210
你知道的
512
00:18:10,210 --> 00:18:12,670
我们已经与这些机架合作了两代
513
00:18:12,670 --> 00:18:13,049
现在
514
00:18:13,049 --> 00:18:14,329
现在我们迎来了第三代
515
00:18:14,329 --> 00:18:18,579
它们将能够快速运行并以高效率部署
516
00:18:18,579 --> 00:18:19,259
在我们这边
517
00:18:19,259 --> 00:18:19,759
客户不
518
00:18:19,759 --> 00:18:20,819
这完全合理
519
00:18:20,819 --> 00:18:21,240
好的
520
00:18:21,240 --> 00:18:23,539
我们现在可以看看薇拉·鲁宾吗
521
00:18:23,539 --> 00:18:25,279
是的
522
00:18:25,279 --> 00:18:26,920
这就是薇拉·鲁宾
523
00:18:26,920 --> 00:18:27,500
嗯
524
00:18:27,500 --> 00:18:28,480
这就是薇拉·鲁宾
525
00:18:28,480 --> 00:18:29,710
BL七十二机架
526
00:18:29,710 --> 00:18:31,109
你可以看到
527
00:18:31,109 --> 00:18:31,509
你知道的
528
00:18:31,509 --> 00:18:36,509
它在形态和外观上与GB三百非常相似
529
00:18:36,509 --> 00:18:38,189
最大的
530
00:18:38,189 --> 00:18:40,969
最大的区别在于计算模块
531
00:18:40,969 --> 00:18:42,809
你会看到没有散热口
532
00:18:42,809 --> 00:18:44,789
GB三百曾有散热口
533
00:18:44,789 --> 00:18:48,400
因为计算模块下半部分仍有风扇
534
00:18:48,400 --> 00:18:49,039
好的
535
00:18:49,039 --> 00:18:49,500
是的
536
00:18:49,500 --> 00:18:50,900
然后我们去掉了这些风扇
537
00:18:50,900 --> 00:18:53,400
计算模块全部采用100%液冷
538
00:18:53,400 --> 00:18:54,720
这也是为什么前面板设计如此
539
00:18:54,720 --> 00:18:56,679
你不再看到散热口了
540
00:18:56,679 --> 00:19:01,160
但整体仍保留九个交换模块
541
00:19:01,160 --> 00:19:05,130
顶部仍是十个计算单元,底部八个
542
00:19:05,130 --> 00:19:07,430
顶部仍保持相同的遥测系统
543
00:19:07,430 --> 00:19:11,170
仍为机架顶部遥测,配备千兆交换机
544
00:19:11,170 --> 00:19:16,720
现在从黑威尔到鲁宾的机架级对比
545
00:19:16,720 --> 00:19:20,298
谈谈机架级的性能提升
546
00:19:20,298 --> 00:19:25,230
机架级性能提升是十倍十倍的十倍
547
00:19:25,230 --> 00:19:25,670
嗯
548
00:19:25,670 --> 00:19:28,650
每秒每兆瓦或每瓦的令牌数
549
00:19:28,650 --> 00:19:29,219
嗯
550
00:19:29,219 --> 00:19:31,038
这就是机架级别的性能指标
551
00:19:31,038 --> 00:19:33,179
一种性能评估指标
552
00:19:33,179 --> 00:19:35,278
这是混合专家模型的表现
553
00:19:35,278 --> 00:19:37,400
类似Kimi K2的思考方式
554
00:19:37,400 --> 00:19:40,719
这是一个拥有万亿参数的超大型模型
555
00:19:40,719 --> 00:19:45,880
这将完全适配并优化在单个机架中
556
00:19:45,880 --> 00:19:47,059
嗯,借助
557
00:19:47,059 --> 00:19:47,740
你知道的
558
00:19:47,740 --> 00:19:49,579
多亏了NvLink技术
559
00:19:49,579 --> 00:19:55,890
混合专家模型中的专家分布在72块GPU上
560
00:19:55,890 --> 00:20:00,660
这能提升每秒处理的token数量
561
00:20:00,660 --> 00:20:02,599
这里展示的是Khyber RP
562
00:20:02,599 --> 00:20:02,859
好的
563
00:20:02,859 --> 00:20:05,200
这将是Reuben Ultra系列
564
00:20:05,200 --> 00:20:06,539
继Rubin之后
565
00:20:06,539 --> 00:20:09,348
这是2026年的产品,2027年推出
566
00:20:09,348 --> 00:20:11,128
我们将推出Rubin Ultra
567
00:20:11,128 --> 00:20:11,588
好的
568
00:20:11,588 --> 00:20:14,729
这将是不同于以往的机架架构
569
00:20:14,729 --> 00:20:16,490
与前三代产品不同
570
00:20:16,809 --> 00:20:19,170
我们将部署更多计算资源
571
00:20:19,170 --> 00:20:19,630
是的
572
00:20:19,630 --> 00:20:22,910
我发现这里多了很多托盘
573
00:20:22,910 --> 00:20:27,869
每个容器包含18个计算托盘
574
00:20:27,869 --> 00:20:29,440
这里有四个容器
575
00:20:29,440 --> 00:20:32,539
每个容器配备最多72块GPU
576
00:20:32,539 --> 00:20:34,299
所以会有288块
577
00:20:34,299 --> 00:20:37,969
从144块升级到288块
578
00:20:37,969 --> 00:20:39,269
或者说是72块
579
00:20:39,269 --> 00:20:40,608
72块扩展到280块
580
00:20:40,608 --> 00:20:40,848
好的
581
00:20:40,848 --> 00:20:42,689
GPU数量实现四倍增长
582
00:20:42,689 --> 00:20:44,249
每个容器
583
00:20:44,249 --> 00:20:47,980
我提到的四个部分相当于整个机架
584
00:20:47,980 --> 00:20:51,180
这里相当于四个机架的GPU算力
585
00:20:51,180 --> 00:20:54,230
每个容器包含72块计算单元
586
00:20:54,230 --> 00:20:56,230
超高计算密度
587
00:20:56,230 --> 00:20:56,910
是的
588
00:20:56,910 --> 00:20:58,990
这就是架构不同的原因
589
00:20:58,990 --> 00:21:01,029
采用刀片式架构
590
00:21:01,029 --> 00:21:02,549
而非托盘式设计
591
00:21:03,589 --> 00:21:08,710
每个容器有18个计算刀片
592
00:21:08,710 --> 00:21:08,970
抱歉
593
00:21:08,970 --> 00:21:10,109
这些都是计算单元
594
00:21:10,109 --> 00:21:12,179
前面全是计算模块
595
00:21:12,179 --> 00:21:12,719
对的
596
00:21:12,719 --> 00:21:18,460
后面是用于NvLink连接的交换刀片
597
00:21:18,460 --> 00:21:18,960
明白了
598
00:21:18,960 --> 00:21:22,868
这就是性能跃升的关键
599
00:21:22,868 --> 00:21:26,828
这就是Rubin Ultra带来的性能突破
600
00:21:26,828 --> 00:21:28,699
凯伯河中的超频
601
00:21:28,939 --> 00:21:32,618
我们尚未在雷布恩超频上展示任何性能表现
602
00:21:32,618 --> 00:21:37,000
但每代之间性能提升将呈数倍增长
603
00:21:38,000 --> 00:21:40,440
芯片层面将实现显著性能提升
604
00:21:40,440 --> 00:21:41,220
在芯片层面
605
00:21:41,220 --> 00:21:42,680
在超级芯片层面
606
00:21:42,680 --> 00:21:43,599
在机架层面
607
00:21:43,599 --> 00:21:47,259
并将实现四倍的极致协同设计
608
00:21:47,259 --> 00:21:47,660
没错
609
00:21:47,660 --> 00:21:48,640
极致协同设计
610
00:21:48,640 --> 00:21:52,069
所有芯片均针对更高性能进行设计
611
00:21:52,069 --> 00:21:53,400
协同工作
612
00:21:53,424 --> 00:21:55,900
从零共同设计
613
00:21:55,900 --> 00:22:00,279
我们是否期待每代都对六颗芯片进行极致协同设计
614
00:22:00,279 --> 00:22:03,549
从现在起我们应看到六款新芯片
615
00:22:03,549 --> 00:22:04,970
对于每一代产品
616
00:22:04,970 --> 00:22:09,380
现在每年都将推出新一代GPU
617
00:22:09,380 --> 00:22:11,740
但并非所有六颗芯片每年都会协同设计
618
00:22:11,740 --> 00:22:14,619
这可能不会成为常态
619
00:22:14,619 --> 00:22:18,890
但在每一代旗舰产品发布时
620
00:22:18,890 --> 00:22:21,029
比如雷布恩六款新芯片
621
00:22:21,029 --> 00:22:21,429
是的
622
00:22:21,509 --> 00:22:24,449
还会推出与雷布恩超频配套的新芯片
623
00:22:24,449 --> 00:22:26,068
并非全部六款
624
00:22:26,068 --> 00:22:26,449
例如
625
00:22:26,449 --> 00:22:28,420
我们可能会看到维拉CPU
626
00:22:28,420 --> 00:22:29,940
而雷布恩超频
627
00:22:29,940 --> 00:22:32,359
GPU完全正确
628
00:22:32,359 --> 00:22:32,720
是的
629
00:22:32,720 --> 00:22:33,859
我对这个非常期待
630
00:22:33,859 --> 00:22:35,690
迫不及待想看到实际效果
631
00:22:35,690 --> 00:22:38,369
我们何时能了解更多相关信息
632
00:22:38,369 --> 00:22:41,859
这是今年还是明年揭晓
633
00:22:41,859 --> 00:22:42,859
所以是的
634
00:22:42,859 --> 00:22:44,779
詹森将会讨论这个
635
00:22:44,779 --> 00:22:46,279
在即将到来的一年里
636
00:22:46,279 --> 00:22:47,180
嗯
637
00:22:47,180 --> 00:22:48,619
我没有具体日期
638
00:22:48,619 --> 00:22:49,680
但确实
639
00:22:49,680 --> 00:22:51,289
我对此非常兴奋
640
00:22:51,289 --> 00:22:54,609
你最期待什么 最让你兴奋的是什么
641
00:22:54,609 --> 00:22:58,569
这种每年一代的快速进化
642
00:22:58,569 --> 00:23:02,089
在极致协同设计下创新程度
643
00:23:02,089 --> 00:23:05,230
正是最令人印象深刻之处
644
00:23:05,230 --> 00:23:05,589
没错
645
00:23:05,589 --> 00:23:05,970
没错
646
00:23:05,970 --> 00:23:09,829
从一代GPU到下一代
647
00:23:09,829 --> 00:23:13,338
工艺提升的空间有限
648
00:23:13,338 --> 00:23:15,239
技术只能不断优化
649
00:23:16,318 --> 00:23:16,598
你知道的
650
00:23:16,598 --> 00:23:20,970
这不是晶体管数量改进的因素
651
00:23:20,970 --> 00:23:23,079
从一代到下一代的过渡
652
00:23:23,079 --> 00:23:24,240
例如
653
00:23:24,240 --> 00:23:25,019
呃
654
00:23:25,019 --> 00:23:27,359
在薇拉·鲁宾和布莱克威尔之间
655
00:23:27,359 --> 00:23:31,319
晶体管数量增加了约70%
656
00:23:31,319 --> 00:23:34,888
就我们共同设计的各种芯片而言
657
00:23:34,888 --> 00:23:38,209
但我们实现了十倍的性能提升
658
00:23:38,209 --> 00:23:39,729
如果只是遵循摩尔定律
659
00:23:39,729 --> 00:23:41,769
只会是70%的增长
660
00:23:41,769 --> 00:23:44,049
而不是千倍的增长
661
00:23:44,049 --> 00:23:44,390
是的
662
00:23:44,390 --> 00:23:44,890
这不是注意力
663
00:23:44,890 --> 00:23:46,369
所以这种
664
00:23:46,369 --> 00:23:46,769
呃
665
00:23:46,769 --> 00:23:48,829
所有这些协同设计的芯片
666
00:23:48,829 --> 00:23:51,539
共同协作以最大化性能
667
00:23:51,539 --> 00:23:55,519
这是这一代及未来世代最令人惊叹之处
668
00:23:55,519 --> 00:23:56,099
是的
669
00:23:56,099 --> 00:23:57,079
这真的令人兴奋
670
00:23:57,079 --> 00:23:58,569
非常感谢您的时间
671
00:23:58,569 --> 00:23:59,970
衷心感谢乔
672
00:23:59,970 --> 00:24:03,069
德拉瓦州解析英伟达的布莱克威尔生态系统
673
00:24:03,069 --> 00:24:08,210
让我们深入了解鲁宾并解释如何加速AI模型
674
00:24:08,210 --> 00:24:10,089
更智能、更高效
675
00:24:10,089 --> 00:24:11,390
不仅仅是语言模型
676
00:24:11,390 --> 00:24:14,509
从图像和视频模型到医学
677
00:24:14,509 --> 00:24:16,470
机器人技术等更多领域
678
00:24:16,470 --> 00:24:19,650
如果你想深入了解这项技术的科学原理
679
00:24:19,650 --> 00:24:21,769
加入我参加英伟达GTC大会
680
00:24:21,769 --> 00:24:27,420
通过下方链接免费注册,参与多个在线会议
681
00:24:27,420 --> 00:24:30,400
我将在会议后宣布RTX 590显卡赠品得主
682
00:24:30,400 --> 00:24:32,130
会议几日后
683
00:24:32,130 --> 00:24:34,390
务必再次参与
684
00:24:34,390 --> 00:24:37,789
感谢英伟达赞助我的行程和媒体访问
685
00:24:37,789 --> 00:24:39,400
支持GTC现场直播
686
00:24:39,400 --> 00:24:41,460
感谢您的支持
687
00:24:41,460 --> 00:24:42,440
谢谢观看
688
00:24:42,440 --> 00:24:45,329
下次再见,我是股票代码
689
00:24:45,329 --> 00:24:46,430
我是亚历克斯
690
00:24:46,430 --> 00:24:51,250
提醒您最好的投资就是投资自己