Skip to content
View AEON-7's full-sized avatar

Block or report AEON-7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
AEON-7/README.md

Tips

AEON-7

NVFP4 quantizations Β· Abliterated LLMs Β· DGX Spark deployments Β· Apple Silicon MLX Β· AI media production

GitHub followers GitHub stars Hugging Face Focus Stack


I build deployment-ready open releases for next-gen hardware β€” NVFP4-quantized abliterated LLMs (Gemma 4, Qwen 3.6, Nemotron 3) on NVIDIA DGX Spark (GB10 / Blackwell / sm_121a), DFlash + EAGLE speculative decoding, Apple Silicon MLX builds for M-series Macs, a real-time voice-AI stack, and a complete agent-driven AI media production toolchain.

Everything below is public, MIT/Apache-licensed, and reproducible β€” Docker stacks, pre-built vLLM images, deployment guides, and benchmark numbers included. Model weights live on πŸ€— Hugging Face.


πŸ“‘ Contents

Section What you'll find there
🎬 AEON Media Production Agent-driven media generation β€” music, radio drama, music videos, cinematic film, and the ComfyUI base stack that powers them
🎀 Voice AI Stack Real-time speech on one DGX Spark β€” OpenAI-compatible TTS + ASR servers, Matrix VoIP bridge, AI persona builder. ~2.1 s end-to-end voice turns
πŸ’Ž Gemma 4 Models Abliterated Gemma 4 NVFP4 quantizations, EAGLE drafters, and a 3.5Γ—-faster DFlash serving container
🍎 Apple Silicon MLX Gemma-4-12B AEON Abliterated on M-series Macs β€” MLX quants + one-paste OpenAI-compatible multimodal server
πŸ‰ Qwen 3.6 Models The flagship line β€” lossless-abliterated Qwen 3.6 dense + MoE at NVFP4, production DFlash path and the DDTree research track
🌌 Nemotron Models Abliterated multimodal Nemotron 3 reasoning for Blackwell-class hardware
πŸ”§ Inference and Optimization Tools The engine room β€” AEON vLLM Ultimate unified image, DFlash, TurboQuant KV compression, modelopt tooling
πŸ“¦ Pre-built Docker Images Every public ghcr.io/aeon-7 container mapped to its repo β€” docker pull and go
πŸ§ͺ Apps and Utilities AI network management, digital gardens, and small sharp tools
πŸ“Š Stats Β· β˜• Support Β· 🀝 Contact Numbers, tips, and how to reach me

🎬 AEON Media Production

Open-source AI-driven media production toolchain. Five focused repositories β€” each generating one kind of media (music, radio drama, music video, cinematic video, or the base ComfyUI stack), all designed for AI agents through skill MD files and CLI scripts. No node-graph wrangling.

Repo What it does β˜…
comfyui-aeon-spark Bleeding-edge ComfyUI for DGX Spark (CUDA 13 + SageAttention v3 + NVFP4 + 14 custom-node packs + Flux 2 Dev / LTX 2.3 22B / ACE-Step v1.5 XL Turbo pre-bundled). Foundation for every other repo in this section.
aeon-music-maker ACE Step 1.5 XL music generation with dynamics-preserving mastering chain (HPF β†’ EQ β†’ tape sat β†’ LUFS gain-match β†’ true-peak ceiling). FLAC-lossless output, auto-detected mastering presets, CLI-driven.
aeon-radio-drama Full-pipeline radio drama / audiobook production β€” dialogue (Qwen3-TTS) + music (ACE Step) + SFX (MMAudio / Stable Audio Open / ACE) + sidechain mix in one command. Three-Lock voice persistence. Bundles standalone music_maker.py + sfx_maker.py for one-shot music or SFX generation.
aeon-music-video Audio-reactive music video builder. librosa-driven beat / onset / RMS / spectral-centroid detection drives ffmpeg filter chains for synced visual effects. CPU-only β€” no GPU, no ComfyUI, no model downloads required.
aeon-movie-maker Fast cinematic video via LTX 2.3 22B. Single clips, full screenplays with character continuity (last-frame carry-forward + per-character seed offsets), and sidechain-mixed final cuts. CLI-tunable LoRA strengths + saturation troubleshooting guide.

What every AEON Media Production repo ships with

File What it is
README.md Quick start, configuration table, local-vs-remote ComfyUI execution modes, env-var reference, model-installation paths
AGENTS.md Step-0 execution-mode detection guide for AI agents, invocation contract, recovery patterns
SKILL.md Full prompt-engineering recipes, troubleshooting decision trees, the canonical agent skill definition
ATTRIBUTION.md Upstream credits β€” every model, library, and custom node properly attributed
.env.example Verbose, self-documenting β€” every variable has inline instructions on where to get values (HF tokens, Civitai tokens, ComfyUI URL patterns)
setup.sh First-time install β€” validates ComfyUI reachability, installs Python deps, inventories model files, prints download commands for missing pieces
sync.sh Incremental update β€” diff preview, auto-stash local edits, ff-only pull, refresh deps, re-run model check. Supports --dry-run / --yes / --no-models
.gitignore Standard β€” never commits output/, models/, .env, __pycache__, etc.
LICENSE MIT

Lifecycle (every repo, same pattern)

git clone https://github.com/AEON-7/<repo>     β†’   ./setup.sh        β†’   start using
                                                       β”‚
                                                       β–Ό
                                              copy .env.example β†’ .env
                                              edit COMFYUI_URL etc.
                                                       β”‚
                                                       β–Ό
                                          python scripts/<tool>.py ...
                                                       β”‚
                            (later, when upstream updates) β–Ό
                                                ./sync.sh
                                              (preview β†’ confirm β†’ pull β†’ refresh)

Local vs remote ComfyUI

Every tool that uses ComfyUI supports two execution modes, documented per-repo:

  • Mode A β€” Local: CLI runs on the same machine as ComfyUI. Just python scripts/<tool>.py ....
  • Mode B β€” Remote: ComfyUI on a GPU box (DGX Spark, headless server). Either invoke the CLI over SSH (ssh user@gpu-host 'cd repo && python ...') or hit the remote ComfyUI HTTP API directly via SSH tunnel or --listen 0.0.0.0.

aeon-movie-maker has additional constraints documented (I2V + screenplay carry-forward needs filesystem-level access β€” pure HTTP-only remote works for T2V single clips only).


🎀 Voice AI Stack

Real-time speech AI for DGX Spark. Three composable sidecars turn any Spark into a voice agent host: pair two OpenAI-compatible audio endpoints (TTS + ASR) with the Matrix WebRTC bridge to dial your AI directly from any Matrix client β€” then compose all three into fully-embodied AI personas with create-agentic-personas. End-to-end voice turn: ~2.1 s on Spark.

Repo What it does β˜…
qwen3-tts-server OpenAI-compatible /v1/audio/speech server backed by Qwen3-TTS-12Hz-1.7B-VoiceDesign. CUDA + bf16 + flash-attn 2 (sm_120 wheel). RTF 1.30Γ— hot path (1.48 s synthesis for ~2 s of speech). Pre-built ghcr image, deploy scripts covering 5 model variants (VoiceDesign / CustomVoice / Base @ 1.7B & 0.6B).
qwen3-asr-server OpenAI-compatible /v1/audio/transcriptions server β€” Qwen3-ASR-0.6B served by vLLM. 30 spoken languages + 22 zh dialects. RTF 16Γ— hot path (120 ms transcription for 2 s of audio). Pre-built ghcr image, deploy scripts for 0.6B / 1.7B variants.
matrix-voip-agent Headless Matrix WebRTC voice agent β€” auto-answers VoIP calls and bridges audio to any AI agent via PipeWire. The recommended bridge for AI-on-Matrix-VoIP: combine with any Matrix homeserver (Synapse / Conduit) and the two sidecars above to dial your AI directly from any Matrix client.
create-agentic-personas Build fully-embodied AI personas on OpenClaw + Matrix β€” each with a chat identity, a knowledge corpus (RAG), a cloned-or-designed Qwen3-TTS voice, and a live WebRTC call line. Composes the three sidecars above into a one-command-per-persona roster builder: secret-free templates, a new-persona.sh scaffold, and a create-agentic-persona agent skill so an agent can spin up new personas itself.

Recommended pairing β€” full voice-AI stack on a single Spark

The three voice sidecars + the Qwen3.6-27B AEON Ultimate MTP-XS vLLM main on one Docker bridge = a complete sub-3-second voice agent. Latency budget (measured, hot path on DGX Spark):

stage wall
inbound RTP packet β†’ matrix-voip-agent ~5 ms
ASR (1.92 s clip β†’ text) 120 ms
LLM (Qwen3.6-27B chat completion, ~10 toks) ~480 ms
TTS (text β†’ 1.92 s WAV) ~1.48 s
outbound RTP β†’ Matrix client ~5 ms
End-to-end voice turn ~2.1 s

Each repo ships with a README.md, agents.md (autonomous bring-up runbook), docs/MODELS.md (variant catalog), docs/ARCHITECTURE.md (full topology), and docs/INTEGRATIONS.md (Matrix + OpenAI SDK + OpenWebUI + Home Assistant + raw HTTP).


πŸ’Ž Gemma 4 Models

Abliterated Gemma 4 deployments at NVFP4 precision (4-bit weights) for NVIDIA DGX Spark / Blackwell GPUs β€” quantized weights, EAGLE speculative-decoding drafters, and a validated DFlash serving container that more than triples single-stream throughput.

Repo Model Architecture Description β˜…
Gemma-4-31B-Uncensored-NVFP4-DFlash Gemma 4 31B Deckard Heretic Serving container + DFlash Validated vLLM container pairing the 31B DECKARD NVFP4 weights with the official z-lab DFlash drafter. 3.5Γ— single-stream (11 β†’ 39 tok/s) and up to 427 tok/s aggregate @ c=32 on DGX Spark β€” with reasoning, tool calling, vision/video input, and structured output fully intact.
Gemma-4-31B-DECKARD-HERETIC-Uncensored-NVFP4 Gemma 4 31B DECKARD HERETIC Dense, thinking NVFP4-quantized abliterated 31B dense reasoning model. AWQ_FULL + SVDQuant variants. πŸ€— weights
Gemma-4-E4B-DECKARD-HERETIC-Uncensored-NVFP4 EAGLE drafter for 31B DECKARD Speculative decoding EAGLE E4B speculative-decoding drafter for the 31B DECKARD HERETIC. πŸ€— weights
Gemma-4-26B-A4B-it-Uncensored-NVFP4 Gemma 4 26B A4B-it MoE NVFP4-quantized 26B MoE. 50 tok/s single, 1430 tok/s aggregate on DGX Spark. πŸ€— weights
Gemma-4-E4B-it-Uncensored-NVFP4 EAGLE drafter for 26B MoE Speculative decoding EAGLE E4B speculative-decoding drafter for the Gemma 4 26B MoE. NVFP4 AWQ. πŸ€— weights
supergemma4-26b-abliterated-multimodal-nvfp4 SuperGemma4 26B Multimodal Multimodal Β· πŸ—„οΈ archived NVFP4 AWQ full quantization of SuperGemma4-26B-Abliterated-Multimodal β€” pre-built vLLM container + patches included. Archived; kept public for reference and reproducibility. πŸ€— weights

🍎 Apple Silicon MLX

The AEON catalog comes to the Mac. Metal-accelerated MLX builds of abliterated Gemma 4 β€” fully multimodal (text + image + audio), OpenAI-compatible, running host-native on any M-series machine. No CUDA required, no Docker GPU passthrough games.

Repo What it does β˜…
gemma4-aeon-abliterated-mlx-toolkit Apple-Silicon toolkit + OpenAI-compatible server for the Gemma-4-12B AEON Abliterated MLX quant grid: near-lossless MLX-8bit (13.4 GB) flagship and compact MLXFP4 (9.3 GB) for 16 GB Macs. One-paste uv quickstart boots a multimodal mlx_vlm.server on a fresh Mac β€” verified image description and speech transcription through the API. Optional MTP speculative decoding (~1.1–1.2Γ— faster, output-identical). Benchmarked on M4 Pro 48 GB.

Grab the weights: πŸ€— MLX-8bit Β· πŸ€— MLXFP4 Β· πŸ€— K4-BF16 source


πŸ‰ Qwen 3.6 Models

The flagship line. Lossless abliteration of Qwen 3.6 with hardware NVFP4 quantization β€” dense 27B and 35B MoE β€” combined with DFlash speculative decoding for serious single-stream throughput on DGX Spark, plus an open research track pushing speculative decoding for hybrid-attention models forward.

Repo Model Architecture Description β˜…
Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash Qwen 3.6 27B AEON Ultimate Uncensored Dense The most-starred release in the catalog. Lossless abliteration with NVFP4 hardware quantization β€” BF16 (51 GB) + NVFP4 (26 GB) deployment guide, docker-compose, and QuickStart. The production serving path for Qwen 3.6 on Spark. πŸ€— weights
Qwen3.6-NVFP4-DFlash Qwen 3.6 35B-A3B-heretic MoE NVFP4 + DFlash speculative decoding on DGX Spark (GB10 / sm_121a). Source-built vLLM image + 7 patches + comprehensive deployment guide. πŸ€— weights
Qwen3.6-27B-AEON-Ultimate-Uncensored-DDTree Qwen 3.6 27B AEON Ultimate Uncensored πŸ”¬ Experimental research track DDTree-on-vLLM for hybrid-attention Qwen 3.6 β€” tree verification, branch-state replay, Gated DeltaNet state handling, fused branch attention. Intentionally candid lab notes: what's been tried, what works, what still breaks, and where the next breakthrough likely lives. Use the DFlash repo above for production.

🌌 Nemotron Models

NVIDIA Nemotron deployments for Blackwell-class hardware.

Repo Model Architecture Description β˜…
Nemotron-3-Nano-Omni-AEON-Ultimate-Uncensored Nemotron 3 Nano Omni 12-D abliterated multimodal BF16 + NVFP4 multimodal reasoning model for DGX Spark / Blackwell. Source-built vLLM v0.20.0 image + 4 patches + benchmark + deployment guide. πŸ€— weights

πŸ”§ Inference and Optimization Tools

The engine room: the unified serving image that runs the whole catalog, plus the speculative-decoding, KV-cache-compression, and quantization building blocks underneath it.

Repo What it does β˜…
vllm-ultimate-dgx-spark ⭐ AEON vLLM Ultimate β€” the current flagship serving image (ghcr.io/aeon-7/aeon-vllm-ultimate). vLLM 0.22.1 + Triton NVFP4 KV cache (PR #44389 cherry-pick β€” up to 3Γ— KV capacity) + TurboQuant K8V4 4-bit KV compression + native DFlash / EAGLE3 via --speculative-config + 4 idempotent sm_121a runtime patches. One image serves the entire AEON model catalog on DGX Spark and RTX 50-series Blackwell.
vllm-dflash The original DFlash vLLM image for DGX Spark β€” Plug & Play Block-Diffusion Speculative Decoding with NVFP4, sm_121a kernels, and Qwen-targeted optimizations. Start with AEON vLLM Ultimate (above) for new deployments.
turboquant Near-optimal KV-cache quantization for LLM inference (3-bit keys, 2-bit values) with Triton kernels + vLLM integration. This fork carries the CUDA-graph-safe QJL _POWERS fix that lets TurboQuant boot under CUDA graph capture β€” bundled into AEON vLLM Ultimate as --kv-cache-dtype tq_k8v4.
Model-Optimizer Tracking fork of NVIDIA's unified model-optimization library β€” quantization, pruning, distillation, speculative decoding β€” for TensorRT-LLM / TensorRT / vLLM deployment. The quantization workhorse behind every NVFP4 release on this page.
modelopt-fast-moe MoE-targeted quantization + AWQ calibration tooling. NVFP4 routing, expert-aware modelopt.

πŸ“¦ Pre-built Docker Images

Every public container on ghcr.io/aeon-7 β€” built, validated, and mapped to the repo that documents it. All images: docker pull ghcr.io/aeon-7/<image>.

Image What it serves Docs / source
aeon-vllm-ultimate ⭐ The unified flagship β€” vLLM 0.22.1 + NVFP4 KV + TurboQuant + DFlash; serves the entire AEON catalog vllm-ultimate-dgx-spark
vllm-aeon-ultimate-dflash Qwen 3.6 27B AEON Ultimate β€” production DFlash serving Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash
vllm-aeon-ultimate Qwen 3.6 27B AEON Ultimate β€” base serving image Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash
vllm-aeon-ultimate-ddtree πŸ”¬ DDTree experimental research image (vLLM build) Qwen3.6-27B-AEON-Ultimate-Uncensored-DDTree
qwen3.6-27b-aeon-ultimate-uncensored-ddtree πŸ”¬ DDTree experimental research image (full container) Qwen3.6-27B-AEON-Ultimate-Uncensored-DDTree
vllm-spark-omni-q36 Qwen 3.6 35B-A3B-heretic NVFP4 + DFlash (source-built vLLM + 7 patches) Qwen3.6-NVFP4-DFlash
gemma-4-31b-uncensored-nvfp4-dflash Gemma 4 31B Deckard Heretic + z-lab DFlash (3.5Γ— single-stream) Gemma-4-31B-Uncensored-NVFP4-DFlash
vllm-spark-gemma4-nvfp4 Gemma 4 31B DECKARD NVFP4 serving Gemma-4-31B-DECKARD-HERETIC-Uncensored-NVFP4
vllm-spark-gemma4-nvfp4-awq Gemma 4 31B DECKARD NVFP4 serving β€” AWQ_FULL variant Gemma-4-31B-DECKARD-HERETIC-Uncensored-NVFP4
aeon-gemma-4-26b-a4b-dflash Gemma 4 26B A4B MoE + DFlash Gemma-4-26B-A4B-it-Uncensored-NVFP4
vllm-nemotron-omni-aeon-ultimate Nemotron 3 Nano Omni β€” source-built vLLM v0.20.0 + 4 patches Nemotron-3-Nano-Omni-AEON-Ultimate-Uncensored
vllm-dflash The original DFlash vLLM image vllm-dflash
comfyui-aeon-spark Full media-production ComfyUI stack for DGX Spark comfyui-aeon-spark
qwen3-tts-server OpenAI-compatible TTS sidecar (Qwen3-TTS) qwen3-tts-server
qwen3-asr-server OpenAI-compatible ASR sidecar (Qwen3-ASR) qwen3-asr-server

πŸ§ͺ Apps and Utilities

Side projects, tools, and infrastructure that aren't model deployments but might be useful.

Repo What it does β˜…
unifi-ai-network-management Agent-ready UniFi / Ubiquiti network management skill and tooling: safe API key setup, backup/restore helpers, OpenClaw + Hermes install paths, and operational playbooks for diagnostics, security events, clients, APs, switches, VLANs, and Wi-Fi automation.
cosmic-mind Security-and-resiliency-focused deployment of the Quartz web app. A place to build your second mind and share it.
regex-builder Simple and elegant RegEx builder.
quartz Fast batteries-included static-site generator that transforms Markdown into fully functional websites. Fork β€” the upstream base for cosmic-mind.

πŸ“Š Stats

GitHub Stats Top Languages


β˜• Support the work

If any of these releases have been useful to you, tips are deeply appreciated β€” they go directly toward more compute, more models, and more open releases. Scan a QR with your wallet, or click any address below to copy.

β‚Ώ Bitcoin (BTC)
BTC QR
bc1q09xmzn00q4z3c5raene0f3pzn9d9pvawfm0py4
Ξ Ethereum (ETH)
ETH QR
0x1512667F6D61454ad531d2E45C0a5d1fd82D0500
β—Ž Solana (SOL)
SOL QR
DgQsjHdAnT5PNLQTNpJdpLS3tYGpVcsHQCkpoiAKsw8t
β“œ Monero (XMR)
XMR QR
836XrSKw4R76vNi3QPJ5Fa9ugcyvE2cWmKSPv3AhpTNNKvqP8v5ba9JRL4Vh7UnFNjDz3E2GXZDVVenu3rkZaNdUFhjAvgd

Ethereum L2s (Base, Arbitrum, Optimism, Polygon, etc.) and EVM-compatible tokens can be sent to the same Ethereum address.


🀝 Get in touch

  • 🌐 Open an issue on any repo for questions, bug reports, or feature requests
  • πŸ€— Model weights, quant grids, and drafters live on Hugging Face β†’ AEON-7
  • πŸ“œ Most releases include a deployment guide + benchmark numbers β€” start there

Built for the open source community on NVIDIA DGX Spark, RTX 5090, and Blackwell-class GPUs β€” and now Apple Silicon.

Popular repositories Loading

  1. Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash Public

    Lossless abliteration of Qwen3.6-27B with NVFP4 hardware quantization for DGX Spark / Blackwell. BF16 (51 GB) + NVFP4 (26 GB) deployment guide, docker-compose, and QuickStart.

    Python 274 28

  2. Qwen3.6-NVFP4-DFlash Qwen3.6-NVFP4-DFlash Public

    Qwen3.6-35B-A3B-heretic NVFP4 + DFlash speculative decoding on DGX Spark (GB10/sm_121a). Source-built vLLM image + 7 patches + comprehensive deployment guide.

    Python 84 10

  3. vllm-dflash vllm-dflash Public

    DFlash vLLM for DGX Spark β€” Plug & Play Block-Diffusion Speculative Decoding

    Python 48 9

  4. comfyui-aeon-spark comfyui-aeon-spark Public

    Bleeding-edge ComfyUI for NVIDIA DGX Spark (GB10/Blackwell/sm_121a). CUDA 13 + SageAttention v3 (sm_121a) + NVFP4 + 14 custom-node packs + Flux 2 Dev / LTX 2.3 22B / ACE-Step v1.5 XL Turbo pre-bund…

    Shell 42 13

  5. Gemma-4-31B-Uncensored-NVFP4-DFlash Gemma-4-31B-Uncensored-NVFP4-DFlash Public

    DGX Spark / GB10 vLLM image for Gemma 4 31B Deckard Heretic Uncensored NVFP4 with z-lab DFlash speculative decoding.

    Python 29 3

  6. Gemma-4-26B-A4B-it-Uncensored-NVFP4 Gemma-4-26B-A4B-it-Uncensored-NVFP4 Public

    NVFP4 Gemma-4 26B-A4B MoE for DGX Spark β€” optimal recipe: DFlash n=10 (flex) on AEON vLLM Ultimate. 144 tok/s single / 1,724 peak (Coding), up to 158 single (Extraction); beats prior v2 by 20-50%.

    25 2