Adaptive Compression¶
UCEF provides three complementary compression strategies that work together through the adaptive compressor. Each strategy is grounded in information theory and optimized for different aspects of the context selection problem.
The Compression Problem¶
Given a set of context blocks \(\{b_1, \ldots, b_n\}\) with total tokens \(T = \sum_i t_i\) and a token budget \(B < T\), select a subset that maximizes information content within the budget.
This is a constrained optimization problem:
UCEF solves this with three strategies, each optimizing a different objective.
Strategy 1: MDL Compression¶
Module: ucef.compression.mdl.MDLCompressor
The Minimum Description Length (MDL) principle selects blocks that minimize the total description cost of the context given the query.
Mathematical Formulation¶
For each block \(b_i\), compute:
where:
\(L(b_i)\): Self-description length¶
The cost of describing block \(b_i\)'s word distribution:
where \(P(w) = \frac{\text{count}(w)}{\text{total\_words}}\) is the global word frequency.
Lower \(L(b_i)\) means the block contains common, well-compressed words. Higher \(L(b_i)\) means the block has rare, specific vocabulary.
\(L(q \mid b_i)\): Conditional description length¶
The cost of describing the query \(q\) given that block \(b_i\) is known:
where \(\text{coverage}(q, b_i) = \frac{|q \cap b_i|}{|q|}\) is the fraction of query terms found in the block.
Lower \(L(q \mid b_i)\) means the block is highly relevant to the query (query terms appear in the block).
MDL Per Token¶
The final score is normalized by token count to prefer compact information:
Blocks are selected in ascending order of \(\text{MDL}_{\text{per\_token}}\) (lower is better).
Sentence-Level Compression¶
For individual block compression, UCEF uses sentence-level selection:
- Split block into sentences
- Score each sentence: \(\text{score} = 0.4 \cdot \text{overlap} + 0.3 \cdot \text{position} + 0.3 \cdot H(s)\)
- Select top-\(k\) sentences where \(k = \lceil|S| \cdot r\rceil\) and \(r\) is the target ratio
Self-Information¶
The self-information of a sentence measures its information density:
Higher entropy sentences contain more diverse, information-rich vocabulary.
Strategy 2: Entropy Compression¶
Module: ucef.compression.entropy.EntropyCompressor
The entropy compressor maximizes information diversity in the selected context, inspired by Maximal Marginal Relevance (Carbonell & Goldstein, 1998).
Objective¶
Greedy Selection¶
At each step, select the block that maximizes:
where:
- \(\text{relevance}(b_i) = \frac{|W(b_i) \cap W(q)|}{|W(q)|}\) is the query relevance (word overlap)
- \(\text{diversity}(b_i) = 1 - \max_{j \in S} J(b_i, b_j)\) is the marginal diversity
- \(\lambda = 0.3\) is the diversity weight (configurable)
- \(J(b_i, b_j) = \frac{|W(b_i) \cap W(b_j)|}{|W(b_i) \cup W(b_j)|}\) is Jaccard similarity
Properties¶
- The first selected block has maximum diversity (score = 1.0)
- Subsequent blocks are penalized for redundancy with already-selected blocks
- Greedy selection has \(O(n \cdot k)\) complexity where \(k\) = number of selected blocks
- The result is a diverse, non-redundant subset
Block Entropy¶
The entropy compressor also provides a per-block entropy measure:
This measures the information content of each block in bits per word.
Strategy 3: Task-Aware Compression¶
Module: ucef.compression.task_aware.TaskAwareCompressor
Task-aware compression extracts the most query-relevant sentences from each block, providing fine-grained compression that preserves task-critical information.
Sentence Scoring¶
Each sentence is scored on three dimensions:
| Dimension | Computation | Rationale |
|---|---|---|
| Word overlap | $\frac{ | W(s) \cap W(q) |
| Position | First sentence: 1.2, Last: 1.1, Middle: 1.0 | Topic sentences are more important |
| Length | Medium (5-30 words): 1.0, Short (<5): 0.5, Long (>30): 0.8 | Prefer concise, complete sentences |
Pipeline¶
- Split each block into sentences
- Score each sentence for query relevance
- Select top-\(k\) sentences per block (\(k = \lceil|S| \times 0.6\rceil\), configurable)
- Reorder selected sentences by original position for coherence
- Budget - Select complete blocks by information density within token budget
LLM-Based Summarization¶
When a model client is available, the task-aware compressor can optionally use the LLM to generate concise summaries for maximum compression. This provides an additional 20-30% compression beyond sentence extraction.
Adaptive Compressor¶
Module: ucef.compression.adaptive.AdaptiveCompressor
The adaptive compressor selects and combines strategies based on the model's context category:
flowchart TD
INPUT[Context Blocks + Budget + Query] --> CHECK{Token Overflow?}
CHECK -->|No| PASS[Return As-Is]
CHECK -->|Yes| STRATEGY{Model Category?}
STRATEGY -->|SMALL<br/>4K-32K| AGG[Aggressive<br/>10% retention<br/>MDL + Task-Aware]
STRATEGY -->|MEDIUM<br/>32K-128K| MOD[Moderate<br/>30% retention<br/>MDL + Entropy]
STRATEGY -->|LARGE<br/>128K+| LIGHT[Light<br/>50% retention<br/>Entropy + Task-Aware]
STRATEGY -->|ADAPTIVE| AUTO[Auto-select<br/>Based on budget ratio]
AGG --> OUTPUT[Compressed Blocks]
MOD --> OUTPUT
LIGHT --> OUTPUT
AUTO --> OUTPUT
Strategy Mapping¶
| Context Category | Native Window | Primary Strategy | Retention | Compressors Used |
|---|---|---|---|---|
| SMALL | 4K - 32K | Aggressive | 10% | MDL + Task-Aware |
| MEDIUM | 32K - 128K | Moderate | 30% | MDL + Entropy |
| LARGE | 128K - 200K | Light | 50% | Entropy + Task-Aware |
| XLARGE | 200K+ | Light | 50% | Entropy + Task-Aware |
| (any) | (any) | Adaptive | Dynamic | All three |
Compression Ratios¶
| Strategy | Retention | Typical Quality Retention |
|---|---|---|
| Aggressive | 10% | ~64% of original |
| Moderate | 30% | ~66-72% of original |
| Light | 50% | ~73% of original |
| Adaptive | Dynamic | Up to ~96% of original |
Complexity Comparison¶
| Compressor | Time Complexity | Space Complexity |
|---|---|---|
| MDL | \(O(n \cdot m + n \log n)\) | \(O(m)\) |
| Entropy | \(O(n \cdot k)\) | \(O(n^2)\) |
| Task-Aware | \(O(n \cdot s)\) | \(O(s)\) |
| Adaptive | Worst of selected | Sum of selected |
Where: \(n\) = number of blocks, \(m\) = average words per block, \(k\) = selected blocks, \(s\) = sentences per block.
Benchmark Latency¶
From simulated experiments:
| Stage | Mean | P95 |
|---|---|---|
| Retrieval | 0.02ms | 0.03ms |
| Selection | 8.79ms | 8.96ms |
| Compression | 0.42ms | 0.44ms |
| Total Pipeline | 9.21ms | 9.47ms |
Compression adds negligible overhead (~0.42ms mean) compared to selection (~8.79ms).
Configuration¶
from ucef.core.config import CompressionConfig
config = CompressionConfig(
default_strategy="adaptive", # Let UCEF choose
aggressive_ratio=0.10, # 10% retention for small models
moderate_ratio=0.30, # 30% for medium models
light_ratio=0.50, # 50% for large models
use_mdl=True, # Enable MDL compression
description_length_weight=0.5, # Balance L(block) vs L(query|block)
use_entropy=True, # Enable entropy compression
entropy_threshold=0.8, # Minimum entropy for block retention
)
References¶
- Grunwald, P. (2007). "The Minimum Description Length Principle." MIT Press.
- Jaynes, E.T. (1957). "Information Theory and Statistical Mechanics." Physical Review.
- Carbonell, J. & Goldstein, J. (1998). "The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries." SIGIR 1998.
- Jiang, H. et al. (2023). "LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models." EMNLP 2023.
Previous: Quantum Selection | Next: Quality Assurance