IBM releases new granite 4.0 model with new hybrid Mamba-2/transformer architecture: significantly reduces memory usage without sacrificing performance

IBM just released Granite 4.0, an open source LLM family that swaps the overall transformer into a hybrid Mamba-2/Transformer stack to cut memory while maintaining quality. Sizes across the 3B density “Micro”, 3B hybrid “H-Micro”, a 7B hybrid Moe “H-tiny” (~1B Active) and 32B hybrid Moe “H-Small” (~9B Active). These models are Apache-2.0, have a password signature, and (Per IBM) is the first open model covered by the approved ISO/IEC 42001:2023 AI Management System Certification. They are available in watsonx.ai and via Docker Hub, Hug Face, LM Studio, Nvidia Nim, Ollama, Replicate, dell Pro AI Studio/Enterprise Hub, Kaggle, and Azure AI Foundry…

So, what is new?

Granite 4.0 introduces a hybrid design that is interwoven with a small portion of the self-spun blocks intertwined with most of the Mamba-2 state space layers (9:1 ratio). According to IBM Technology Blog, granite 4.0-H reduces RAM compared to traditional transformer LLM > 70% For long articles and more inferences above, conversion to GPUs with a given throughput/latency target is lower. IBM’s internal comparison also shows the smallest granite 4.0 model Better than granite 3.3-8b Although fewer parameters are used.

Tell me what the variant is released?

IBM is shipping according to and instruct Variations of four initial models:

🚨 (recommended reading) Vipe (video posture engine): a powerful 3D video annotation tool for space AI

Granite-4.0-H-Small: Total 32B, ~9b active (hybrid MOE).
Granite-4.0-H micro: Total 7B, ~1b active (mixed MOE).
Granite-4.0-H-Micro: 3b (hybrid intensive).
Granite-4.0-Micro: 3B (Dense transformer that does not support a hybrid stack).

Everyone is Apache-2.0 and Password signature; IBM State Granite is the first family of recognized open models ISO/IEC 42001 The coverage of its AI Management System (AIMS). The reasoning optimization (“thinking”) variant is scheduled to be conducted later in 2025.

How is it trained, context and DTYPE?

Granite 4.0 received training on samples 512K token And evaluate 128K token. The public checkpoint about hugging the face is BF16 (Quantitative and GGUF Conversions have also been released), while FP8 is an execution option that supports hardware, not a format that publishes weights.

Let’s understand its performance signals (related to the enterprise)

IBM highlights the following commands and tool usage benchmarks:

Ifeval (Helm): Granite-4.0-H-SMALL leads most open models (only lags behind the Llama 4 Maverick on a larger scale).

https://www.ibm.com/new/announcements/ibm-granite-4-0-hyper-felf-high-performance-hybrid-models

BFCLV3 (function call): H-Small has a high competitiveness in open/closed models.

mtrag (multiple-to-rag): Improves the reliability of complex search workflows.

How should I access it?

Granite 4.0 in IBM WATSONX.AI And pass Dell Pro AI Studio/Enterprise Hub, Docker Hub, Hugging Face, Kaggle, LM Studio, Nvidia Nim, Ollama, Ollama, Opacity, Replicate. Continuous support for IBM annotations VLLM, Llama.cpp, Nexaml and MLX Used for mixed foods.

I think the hybrid MAMBA-2/transformer stack and active parameter MOE of Granite 4.0 is a practical way to reduce TCO: > 70% memory reduction and long-form culture throughput gain directly translated into smaller GPU fleets without sacrificing instructions – follow instructions – follow instructions or tool usage accuracy (IFEVAL, ifeval, bfclv3, bfclv3, mtrag). BF16 checkpoints with GGUF conversion simplify the local evaluation pipeline, and ISO/IEC 42001 plus signed artifact address source/compliance gaps often stagnate enterprise deployment. Net Results: Lean, auditable basic model family (1B-9B activity) easier to produce than previous 8B class transformers.

Check Embrace Facial Model Card and Technical details. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter. wait! Are you on the telegram? Now, you can also join us on Telegram.

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

🔥 (Recommended Reading) NVIDIA AI Open Source VIPE (Video Pose Engine): a powerful and universal 3D video annotation tool for spatial AI

Source link