Universal Context Extension Framework¶
Breaking the Context Barrier: Model-Agnostic Infinite Context with Quality Preservation
UCEF enables any large language model — from 4K context to 1M+ tokens — to handle unlimited input while preserving output quality through principled mathematical frameworks drawn from hyperbolic geometry, quantum probability theory, and information theory.
Why UCEF?¶
Modern LLMs are constrained by finite context windows. When documents exceed the native window, critical information is lost through naive truncation or simple RAG retrieval. UCEF solves this with a mathematically grounded pipeline that retrieves, selects, compresses, and quality-controls context in a closed feedback loop.
| Approach | Context Limit | Quality Retention | Hierarchical Awareness |
|---|---|---|---|
| Naive Truncation | Native window | ~50-70% | None |
| Standard RAG | Unlimited | ~60-75% | None |
| LongLLMLingua | 4x window | ~79-89% | Partial |
| UCEF | Unlimited | 89-94% | Full |
Quick Install¶
# Core installation
pip install ucef
# With all optional dependencies (Redis, ChromaDB, Pydantic)
pip install ucef[all]
30-Second Example¶
import asyncio
from ucef import UniversalContextSystem, UCEFConfig
async def main():
# 1. Initialize with your model client
config = UCEFConfig()
system = UniversalContextSystem(
model_client=my_model_client,
model_name="gpt-4o",
config=config,
)
await system.initialize()
# 2. Store documents (unlimited)
await system.store_text(
"The Eiffel Tower was completed in 1889 for the World's Fair...",
doc_id="eiffel_001",
)
# 3. Query with automatic context extension
result = await system.query("When was the Eiffel Tower built?")
print(f"Quality: {result.overall_quality:.2f}")
print(f"Context blocks: {len(result.context_blocks)}")
print(f"Tokens used: {result.total_tokens}")
asyncio.run(main())
Architecture Overview¶
UCEF processes queries through a five-stage pipeline:
graph LR
Q[Query] --> R[Hyperbolic<br/>Retrieval]
R --> S[Quantum<br/>Selection]
S --> C[Adaptive<br/>Compression]
C --> E[Quality<br/>Evaluation]
E -->|Quality < threshold| F[Feedback<br/>Refinement]
F -->|Re-retrieve| R
E -->|Quality OK| OUT[QueryResult]
Core Modules¶
| Module | Function | Mathematical Basis |
|---|---|---|
| Hyperbolic Retriever | Semantic nearest-neighbor search in Poincare ball | Hyperbolic geometry (Riemannian metric) |
| Quantum Selector | Context selection via superposition and measurement | Quantum probability (Born rule, density matrices) |
| Adaptive Compressor | Task-aware compression respecting token budgets | MDL principle, maximum entropy |
| Quality Feedback Loop | Closed-loop quality refinement | Multi-dimensional evaluation (4 metrics) |
| Three-Layer Memory | Hot/Warm/Cold document storage | Tiered caching with promotion/demotion |
Key Features¶
Model Agnostic — Works with any LLM: GPT-4, Claude, LLaMA, Qwen, GLM, DeepSeek, Mistral, and more. Automatically profiles each model's capabilities and selects optimal strategies.
:material-geometry: Hyperbolic Retrieval — Documents embedded in the Poincare ball capture hierarchical relationships exponentially more efficiently than Euclidean space, enabling O(log n) semantic search.
Quantum-Inspired Selection — Candidate contexts exist in superposition. The query acts as a measurement operator, collapsing the state to the most relevant subset. Entanglement captures inter-document correlations.
:material-compress: Adaptive Compression — Three compression strategies (MDL, entropy, task-aware) automatically adapt to the model's context window size. Small models get aggressive compression; large models get light touch.
Quality Preservation — A four-dimensional quality metric (relevance, completeness, coherence, accuracy) drives a feedback loop that iteratively refines context until quality meets threshold.
:memory: Three-Layer Memory — Redis hot cache (<10ms), ChromaDB warm storage (<100ms), and filesystem cold archive (<500ms) with automatic document promotion and demotion.
Performance¶
Benchmarked on LongBench with real LLM API calls:
| Model | Method | ROUGE-L | Token F1 |
|---|---|---|---|
| GLM-4-flash | Truncate | 0.1433 | 0.1563 |
| GLM-4-flash | RAG | 0.1340 | 0.1458 |
| GLM-4-flash | UCEF | 0.1479 | 0.1631 |
| DeepSeek-v3 | Truncate | 0.1889 | 0.1988 |
| DeepSeek-v3 | RAG | 0.1800 | 0.1882 |
| DeepSeek-v3 | UCEF | 0.2146 | 0.2315 |
See Experiments for full per-task breakdowns.
Next Steps¶
- Installation Guide — Set up UCEF in your environment
- Quickstart Tutorial — Working code examples
- Configuration Reference — All config options explained
- Architecture Deep-Dive — How the system works internally
Citation¶
@article{he2026ucef,
title={UCEF: Universal Context Extension Framework},
author={He, Honglin},
journal={arXiv preprint},
year={2026}
}
License¶
UCEF is released under the MIT License.
Built by Honglin He. Source code available on GitHub.