humanLLM

A personal memory system for LLMs that stores facts about you in model weights — not context files.

Inspired by how human memory works: short-term compression during conversation, long-term consolidation during sleep.

How it works

Phase 1 — Hierarchical context compression

During conversation, when context grows too large, it's compressed into a tiered hierarchy of abstract facts. Facts are stored in a ledger with importance scores that increase with repetition — just like human memory strengthens through reinforcement.

Phase 2 — Weight consolidation

At the end of each conversation, the system generates synthetic training data from the ledger facts, filters out hallucinations, and annotates per-example importance weights. After stopping inference, a LoRA adapter is trained on this data. The model now knows your facts — no context injection, no retrieval, no tokens spent.

Architecture

conversation.py   — main chat loop, triggers compression on context overflow
compressor.py     — hierarchical fact extraction + contradiction detection
memory.py         — importance ledger with reinforcement and weighted replay
trainer.py        — synthetic data generation, hallucination filtering, LoRA training
train.py          — standalone training script (run after stopping vLLM)

Workflow

# 1. Start vLLM with your adapter (first run: omit --enable-lora flags)
python -m vllm.entrypoints.openai.api_server \
  --model models/Qwen2.5-7B-Instruct-AWQ \
  --port 8000 --quantization awq --max-model-len 8192 --enforce-eager \
  --enable-lora --max-lora-rank 64 \
  --lora-modules personal=/path/to/humanLLM/adapter

# 2. Have a conversation
python app/conversation.py
# On exit: facts are compressed, ledger updated, synthetic data generated

# 3. Stop vLLM, then train
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python app/train.py
# Adapter saved to adapter/

# 4. Restart vLLM with updated adapter and repeat

Requirements

ROCm 7.x (AMD GPU) or CUDA
vLLM built from source
HuggingFace peft, transformers, datasets
openai Python package

Models

Tested with Qwen/Qwen2.5-7B-Instruct-AWQ for inference (fits in 16GB VRAM) and Qwen/Qwen2.5-7B-Instruct (fp16) for LoRA training.

Prior art

Closest academic work: Language Models Need Sleep and SCM: Sleep-Consolidated Memory with Algorithmic Forgetting. This project independently arrived at the same core idea and implements it as a practical local system.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

humanLLM

How it works

Architecture

Workflow

Requirements

Models

Prior art

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

humanLLM

How it works

Architecture

Workflow

Requirements

Models

Prior art

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages