Neuphonic open source air: 748m parameter pronunciation language model, real-time voice cloning

Neuphonic has been released NETTS Airopen source text to speech (TTS) Phonetic language model Designed to run in real time on the CPU. this Embrace Facial Model Card List 748m parameter (QWEN2 architecture) and GGUF quantization ships (Q4/Q8), thus passing llama.cpp/llama-cpp-python No cloud dependency. It has been licensed Apache-2.0 and include operational Demo and examples.

So, what is new?

Neutts Air Couple 0.5B level QWEN main chain With neuphyonic Neucodec Audio codec. Neuphonic positions the system as a “surreal, device” TTS LM, which will be from ~3 seconds of reference audio And integrate voice in this style, targeting voice proxy and privacy-sensitive applications. Model cards and repositories explicitly emphasize Real-time CPU Power generation and small inches deployment.

Key Features

The following realism scale: ~0.7b (Qwen Level 2) Text-to-Speech LM similar to human rhythm and timbre preservation.
Equipment Deployment: Distributed in GGUF (Q4/Q8) with CPU priority path; suitable for laptops, phones and raspberry PI-grade boards.
Instant Speaker Cloning: Style transfer ~3 seconds Reference audio (refer to WAV + transcript).
Compact LM+ codec stack: QWEN 0.5B Backbone matching Neucodec (0.8 kbps / 24 kHz) Balance delay, footprint and output quality.

Explain the model architecture and runtime paths?

backbone: QWEN 0.5B Used as a lightweight LM to regulate speech production; hosting artifacts reported 748m parameter exist qwen2 Building embracing faces.
Codec: Neucodec Provides low focus sound tokenization/decoding; it targets 0.8 kbps and 24 kHz Output, enables compact representation for efficient device use.
Quantization and formatting: Pre-built GGUF Available backbone (Q4/Q8); the repository includes relevant instructions llama-cpp-python and an optional onnx Decoder path.
Dependencies: use espeak For tone; provides examples and jupyter notebooks for end-to-end synthesis.

Equipment performance focus

NETTS Air exhibit ‘Real-time generation on mid-range devices‘and the offer CPU priority Default; GGUF quantization is designed for use in laptops and single-board computers. Although the FPS/RTF number is not posted on the card, the target is assigned Local inference without GPU And demonstrates the workflow through the examples and spaces provided.

🚨 (recommended reading) Vipe (video posture engine): a powerful 3D video annotation tool for space AI

Voice cloning workflow

Neutts air needs (1)a Reference wav (2) Transcript text For reference. It encodes a reference to the style token and then synthesizes arbitrary text In the timbre of the speaker. Neuphonic Team Recommendations 3–15 s Clean, single sound, and provide precoded samples.

Privacy, Responsibility and Watermark

Neuphonic framework model Device Privacy (No audio/text leaves the machine without user approval) and state that all generated audio contains one Perth (perception threshold) watermark Supports responsible use and source.

How to compare?

Open local TTS systems exist (e.g., GGUF-based pipelines), but Neutts Air is the packaging Small LM + Neural Codec and Instant cloning,,,,, CPU-first quantizationand Watermark Under permission. The supplier’s claim is the wording of “the world’s first surreal speech LM”; the verifiable fact is Size, format, clone program, license and provided runtime.

The point is system tradeoffs: ~0.7B QWEN-class main chain with GGGUF quantization paired with Neucodec 0.8 kbps/24 kHz is a pragmatic recipe for real-time, CPU-only TTS that keeps latency and memory predictable while maintaining ~3-15 s style references. Apache-2.0 license and built-in watermarks are deployment-friendly, but releasing RTF/latency on commodity CPUs and clone quality and reference length curves will enable strict benchmarking of existing local pipelines. Operationally, offline paths with minimal dependencies (ESPEAK, LLAMA.CPP/ONNX) reduce privacy/compliance risks for edge agents without sacrificing clarity.

Check Hug face and model cards on github page. Check out ours anytime Tutorials, codes and notebooks for github pages. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter. wait! Are you on the telegram? Now, you can also join us on Telegram.

Michal Sutter is a data science professional with a master’s degree in data science from the University of Padua. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels in transforming complex datasets into actionable insights.

🔥 (Recommended Reading) NVIDIA AI Open Source VIPE (Video Pose Engine): a powerful and universal 3D video annotation tool for spatial AI

Source link