This year’s NVIDIA GPU Technology Conference (GTC) couldn’t be more timely for NVIDIA. The hottest topics in technology today are the Artificial Intelligence (AI) behind ChatGPT, other related Large Language Models (LLM), and their application to generative AI applications. The foundation of all this new AI technology is the NVIDIA GPU. NVIDIA CEO Jensen Huang has doubled down on LLM support and the future of generative AI based on it. He calls it the “iPhone moment for AI.” Using LLM, AI computers can learn the language of people, programs, images, or chemistry. Create new and unique works based on your queries using our large knowledge base. This is generative AI.
Jumbo-sized LLMs take this capability to a new level. Specifically, the latest GPT 4.0 introduced just before GTC. Training these complex models requires thousands of GPUs, and applying these models to specific problems requires even more GPUs for inference. Nvidia’s latest Hopper GPU, the H100, is known for training, but GPUs can also be split into multiple instances (up to seven). Nvidia calls it MIG (Multi-Instance GPU) and allows multiple inference models to run on a machine. GPUs. It is in this inference mode that the GPU uses pre-trained LLMs to transform queries into new outputs.
Nvidia is using its leadership position to create new business opportunities by becoming a full-stack supplier of AI, including chips, software, accelerator cards, systems and even services. The company has started a service business in areas such as biology, for example. A company’s pricing may be based on hours of use, or it may be based on the value of the end product built with its services.
By using these large training sets, AI computers will be able to understand ordinary languages better. For example, a trained AI computer can receive a request to create a program and use its knowledge of programming structures to construct the requested program. However, there is still a large gap between the best efforts of AI and a well-built, validated and documented program. That is today. But even today, this approach democratizes programming by creating better “no-code” applications, making it easier for people to build their own applications using human language without requiring a deep understanding of programming constructs. I am making it possible.
However, AI’s knowledge of ground truth is limited. AI models only work from the data they were trained on. Streaming real-time information from the world and retraining this steady stream of new information on the fly is not yet possible.
Scaling your LLM to new levels gives you new powers. The bigger the “brain”, the more features you get. But there is still a need to adopt ethical structures, avoid bias, and add guardrails. Much more work needs to be done on AI ethics.
Nvidia sees training as the production of intelligence, or the creation of an “AI factory.” Inference takes that knowledge and makes it actionable, deployable, and scalable from the cloud to end devices. Nvidia’s service model allows businesses to use and customize pre-trained base models created by Nvidia. Enterprises can add guardrails to inference results and incorporate internal or proprietary data. Training takes place in the cloud, with Nvidia offering his DGX computer cloud service directly or through a hyperscaler cloud. This training-as-a-service is tailored for businesses that do not have these capabilities in-house.
Today’s features are already shaking things up. Startups are on the bandwagon. Remember, all major hyperscalers are developing strategies around LLM. Microsoft built the data center for his OpenAI, which developed his ChatGPT using the A100 GPU, the predecessor of the H100. Microsoft is building AI technology into Bing for search and chat, and into Microsoft 365 as a Copilot assistant for work tasks. All these services run on Nvidia GPUs. Microsoft Azure cloud service is already using his new Nvidia H100 GPU in private preview.
In addition to Microsoft’s Azure, NVIDIA has multiple cloud vendors for the H100 cloud. Oracle Cloud Infrastructure has limited availability and is generally available through Cirrascale and CoreWeave. Nvidia said his H100 cloud on AWS will be available in limited preview in the coming weeks. Additionally, Google Cloud plans to offer H100 cloud services alongside NVIDIA’s cloud partners Lambda, Paperspace and Vultr.
Nvidia offers three new LLM modalities/services:
1. Human languages have NeMo.
2. For images (including videos), we have Picasso.
3. For biology, the language of proteins, the service is BioNeMo.
BioNeMo can be used to teach AI the protein language for biology and drug discovery research. Nvidia’s Huang hopes this will lead to more widespread use of AI in medicine and drug discovery. AI has many applications for data filtering, drug trial prediction, and other drug discovery. Use cases can predict 3D protein structures, molecular docking, and more. The BioNeMo service allows companies to build custom models based on their own data. Using this tool, the model training time can be reduced from 6 months of computation to just 4 weeks. Amgen is his Nvidia partner early on. Nvidia also partners with he Medtronic on intelligent medical devices. AI is proving very useful in automating routine tasks and reducing complexity, appealing to medical professionals.
Building a digital twin world
Another big announcement from Nvidia was the Omniverse platform. This digitization of the real world has many uses. Automotive and other manufacturing industries, in particular, seem to be embracing digital representations of real-world machines (the so-called digital twins).
Focusing on automakers in his keynote, Huang said Omniverse is being used from the factory to the customer experience. Omniverse speeds up your entire workflow. From design to manufacturing to customer service. GM uses digital twins of automotive designs to help model aerodynamics. Toyota, Mercedes and BMW use his Omniverse for factory design. LUCID uses 3D VR cars to attract customers. At GTC it was also announced that Microsoft Azure will offer his Omniverse service.
Nvidia also debuted the Isaac Sim platform, designed to enable global teams to collaborate remotely to build, train, simulate, validate and deploy robots.
new data center hardware
To support advanced generative AI, Nvidia also announced four inference platforms. This includes his Nvidia L4 for making AI videos. His Nvidia L40 for 2D/3D image generation. Nvidia H100 NVL for deploying large language models. Nvidia also announced that its Grace Hopper “superchip”, which connects Arm-based Grace CPUs and Hopper GPUs via a high-speed coherent interface, is now sampling. Grace Hopper is intended for recommendation systems that work from very large datasets.
The low-profile L4 PCIe card delivers up to 120x AI-powered video performance over CPUs and is far more energy efficient. With the announcement of G2 virtual machines available in private preview, Google Cloud is the first cloud service provider to offer NVIDIA’s L4 Tensor Core GPUs.
The L40 PCIe card serves as the engine of Omniverse. The company says it offers 7x his inference performance in Stable Diffusion and 12x his Omniverse performance compared to his previous-generation Nvidia products. The L4 and L40 products are based on the Ada Lovelace GPU architecture.
Based on the NVIDIA Hopper GPU computing architecture, the H100 incorporates the Transformer Engine. It is optimized for developing, training, and deploying generative AI, large language models (LLM), and recommender systems. Part of the H100’s performance improvement over the A100 is the H100’s use of his FP8 precision arithmetic. This enables faster AI inference with 9x faster AI training and up to 30x faster with LLM compared to the previous generation his A100.
NVIDIA H100 NVL for large-scale deployment of large LLMs like ChatGPT. The new H100 NVL features 94 GB of memory with Transformer Engine acceleration, delivering up to 12x faster inference performance at GPT-3 compared to the previous generation A100 at data center scale. The H100 NVL PCIe card has two H100 GPUs connected via the NVLink coherent bus. The H100 NVL GPU is expected in the second half of this year.
Building better chips with GPUs
Another application for Nvidia’s GPU technology is computational lithography. In one of his applications in chip manufacturing, GPUs correct for optical diffraction (blurring) when generating optical masks used to project photolithographic layers onto silicon. Nvidia has created a CuLitho service that allows optical and EUV mask makers to use the new Hopper H100 GPU to accelerate computations nearly 40x over traditional CPU computations. One of the first partners is Nvidia’s longtime foundry partner TSMC. Synopsys and ASML have also partnered.
Nvidia is riding the wave of advanced AI and digital twins. Its first-mover advantage has been undiminished since the very early days of accelerated AI processing. The company is expanding its business model to cloud services, even through other companies’ clouds. Huang explained that Nvidia is a new model, a cloud within a cloud.
The key to Nvidia’s success is that it continues to move its business further up the value chain, building on years of software development and the industry’s most mature AI platform and broadest ecosystem. Somehow the company always seems to be in the right place at the right time.
Tirias Research tracks and consults companies across the electronics ecosystem, from semiconductors to systems to sensors to the cloud. Members of the Tirias Research team have consulted across the CPU and GPU IP ecosystem including AMD, Arm, Intel, Nvidia, Qualcomm, and SiFive.
Follow me please twitter.