GQI¶
GQI predicts a number from text.
You give it examples — a support ticket and the priority you'd assign it (1–10), an essay and its grade, a SQL query and how long it took to run — and GQI learns the pattern. Then, for new text you haven't labeled, it predicts the number for you. Every prediction comes with a confidence range, so you know which answers to trust and which to double-check.
If you've used scikit-learn, it's the same fit / predict you already know. If
you haven't: fit teaches GQI from your examples, predict uses it on new text —
two function calls.
Playground — copy, paste, run¶
This is the live demo, as ~40 lines of plain Python you
can run right now (only requests needed). It loads a real dataset, scores the
base model on held-out rows, fine-tunes on your rows with an explicit config,
and scores again — so you see the lift end to end:
# GQI playground — fit + eval on real data, end to end.
import requests, time
BASE = "https://gqilabs.com/api"
KEY = "gqi_sk_test_..." # create one at gqilabs.com/account ($30/mo free, no card)
H = {"Authorization": f"Bearer {KEY}"}
# -- 1. data: the demo's live sample (x86 assembly -> measured CPU cycles), or bring your own --
d = requests.get(f"{BASE}/v1/examples", params={"n_train": 256}, headers=H).json()
X_train, y_train = d["X_train"], d["y_train"]
X_test, y_test = d["X_test"], d["y_test"]
print(f"{d['name']}: {len(X_train)} train / {len(X_test)} test — {d['target_meaning']}")
def wait(job_id):
while True:
j = requests.get(f"{BASE}/v1/jobs/{job_id}", headers=H).json()
if j["status"] == "done": return j["result"]
if j["status"] == "error": raise RuntimeError(j["error"])
time.sleep(3)
def evaluate(adapter_id=None):
job = requests.post(f"{BASE}/v1/eval_async", headers=H, json={
"X": X_test, "y_true": y_test, "adapter_id": adapter_id, "num_samples": 16}).json()
m = wait(job["job_id"])["median"]
return m["spearman"], m["mae"]
# -- 2. score the base model, zero-shot --
rho0, mae0 = evaluate()
print(f"base spearman {rho0:.3f} mae {mae0:.2f}")
# -- 3. fine-tune on the train rows (async, a few minutes), with an explicit config --
fit = requests.post(f"{BASE}/v1/fit", headers=H, json={
"X": X_train, "y": y_train,
"config": {"max_steps": 400, "lr": 5e-5, "lora_r": 8}}).json()
wait(fit["job_id"])
# -- 4. score the fine-tuned model on the same held-out rows --
rho1, mae1 = evaluate(fit["adapter_id"])
print(f"fine-tuned spearman {rho1:.3f} mae {mae1:.2f} (lift {rho1-rho0:+.3f})")
Real output from this exact script (results vary a little run to run):
x86-basicblock-throughput-cycles: 256 train / 128 test — throughput in CPU cycles (lower = faster); ranges from <1 to ~16,000
base spearman 0.808 mae 1.27
fine-tuned spearman 0.927 mae 0.58 (lift +0.119)
To use your own data instead, replace step 1 with any text column + number column (a pandas DataFrame works directly):
import pandas as pd
df = pd.read_csv("your_data.csv") # columns: text, target
split = int(len(df) * 0.8)
X_train, y_train = df.text[:split].tolist(), df.target[:split].astype(float).tolist()
X_test, y_test = df.text[split:].tolist(), df.target[split:].astype(float).tolist()
What to expect
The first call after idle pays a GPU cold start (~1 min); the whole script is a
few minutes end to end. config keys and defaults are in the
API reference — everything is optional. Prefer a browser? The same
flow with progress bars is the live demo.
Quickstart with the gqi package¶
The Python package wraps the same engine in a scikit-learn estimator:
from gqi import GQI
reg = GQI(method="lora", model="small") # local + free, on your hardware
reg.fit(
["cannot log in, urgent!", "typo on the about page", "server down in prod!!"],
[9.0, 2.0, 10.0], # the priority you'd give each (1–10)
)
y_hat, y_std = reg.predict(["payment failing for all users"], return_std=True)
print(y_hat, y_std) # -> [8.6] [1.1] priority ≈ 8.6, give or take ~1.1
Not on PyPI yet
The gqi package is pre-release — it isn't publicly installable yet. Until
it ships, the playground above and the hosted API are the supported
ways in; the package guide previews the interface you'll get.
Why reach for GQI (instead of asking an LLM for a number)¶
- You can trust the confidence.
predict(return_std=True)returns the number and a meaningful estimate of how sure it is — so you can auto-accept confident answers and route the unsure ones to a human. A general LLM hands you a number with no honest error bar. - A familiar interface. A clean
.fit()/.predict()surface — the same contract you already use with scikit-learn. - Fast, cheap tuning. Each dataset becomes its own lightweight add-on on top of a shared base model, so adapting GQI to your data is quick and inexpensive.
Plain-English glossary¶
| Term | What it means |
|---|---|
| Calibrated | The confidence range is trustworthy — when GQI says it's unsure, it really is. That's what lets you threshold on confidence. |
| fit / predict | fit teaches GQI from your labeled examples; predict uses it on new text. (Same as scikit-learn.) |
| Tuned model (adapter) | What fit produces — a small add-on that remembers your data, on top of a shared base model. Also called a LoRA adapter. |
| Fine-tune | Training that small add-on on your examples — what fit does. |
| Base model | The shared, pre-trained model you can use with no fitting at all — a good zero-setup baseline. |
y_std / uncertainty |
The ± confidence range on a prediction (a standard deviation). Bigger = less sure. |
| Token | Roughly a word-piece of input text (~4 characters). Hosted billing counts input tokens; the output is one number, so it's free. |
Where to go next¶
- Overview — what GQI is for, and why calibrated uncertainty is the differentiator.
- How it works — the mental model: one shared base model + lightweight per-dataset adapters.
- Python package
gqi— the sklearn-style estimator; free and local, hosted with an API key. - Hosted API — REST endpoints for keys, fitting, and predicting at scale.