1 DeepSeek R1 Model now Available in Amazon Bedrock Marketplace And Amazon SageMaker JumpStart
Alfred Furnell edited this page 3 months ago


Today, we are delighted to announce that DeepSeek R1 distilled Llama and Qwen models are available through Amazon Bedrock Marketplace and Amazon SageMaker JumpStart. With this launch, you can now release DeepSeek AI's first-generation frontier design, DeepSeek-R1, along with the distilled variations varying from 1.5 to 70 billion criteria to develop, experiment, and properly scale your generative AI ideas on AWS.

In this post, we show how to begin with DeepSeek-R1 on Amazon Bedrock Marketplace and SageMaker JumpStart. You can follow comparable steps to release the distilled versions of the models too.

Overview of DeepSeek-R1

DeepSeek-R1 is a big language design (LLM) developed by DeepSeek AI that utilizes support learning to capabilities through a multi-stage training procedure from a DeepSeek-V3-Base foundation. A key identifying function is its reinforcement knowing (RL) step, which was utilized to refine the design's responses beyond the standard pre-training and fine-tuning procedure. By including RL, DeepSeek-R1 can adapt more effectively to user feedback and goals, eventually boosting both importance and clearness. In addition, DeepSeek-R1 uses a chain-of-thought (CoT) approach, implying it's equipped to break down complex inquiries and reason through them in a detailed way. This guided reasoning procedure permits the model to produce more precise, transparent, and detailed answers. This model integrates RL-based fine-tuning with CoT abilities, aiming to produce structured reactions while focusing on interpretability and user interaction. With its comprehensive abilities DeepSeek-R1 has actually caught the industry's attention as a versatile text-generation model that can be incorporated into various workflows such as agents, sensible reasoning and information interpretation tasks.

DeepSeek-R1 utilizes a Mix of Experts (MoE) architecture and is 671 billion specifications in size. The MoE architecture permits activation of 37 billion parameters, enabling efficient reasoning by routing queries to the most relevant specialist "clusters." This technique enables the model to concentrate on various problem domains while maintaining overall efficiency. DeepSeek-R1 requires a minimum of 800 GB of HBM memory in FP8 format for reasoning. In this post, we will utilize an ml.p5e.48 xlarge circumstances to release the design. ml.p5e.48 xlarge features 8 Nvidia H200 GPUs supplying 1128 GB of GPU memory.

DeepSeek-R1 distilled designs bring the reasoning capabilities of the main R1 design to more efficient architectures based on popular open designs like Qwen (1.5 B, 7B, 14B, and 32B) and forum.pinoo.com.tr Llama (8B and 70B). Distillation refers to a process of training smaller sized, more efficient designs to imitate the habits and thinking patterns of the larger DeepSeek-R1 model, using it as a teacher model.

You can release DeepSeek-R1 design either through SageMaker JumpStart or Bedrock Marketplace. Because DeepSeek-R1 is an emerging design, we advise deploying this design with guardrails in place. In this blog site, we will utilize Amazon Bedrock Guardrails to present safeguards, avoid harmful material, and [forum.batman.gainedge.org](https://forum.batman.gainedge.org/index.php?action=profile