Skip to content

GQI

GQI predicts a number from text.

You give it examples — a support ticket and the priority you'd assign it (1–10), an essay and its grade, a SQL query and how long it took to run — and GQI learns the pattern. Then, for new text you haven't labeled, it predicts the number for you. Every prediction comes with a confidence range, so you know which answers to trust and which to double-check.

If you've used scikit-learn, it's the same fit / predict you already know. If you haven't: fit teaches GQI from your examples, predict uses it on new text — two function calls.

Playground — copy, paste, run

This is the live demo, as ~40 lines of plain Python you can run right now (only requests needed). It loads a real dataset, scores the base model on held-out rows, fine-tunes on your rows with an explicit config, and scores again — so you see the lift end to end:

# GQI playground — fit + eval on real data, end to end.
import requests, time

BASE = "https://gqilabs.com/api"
KEY  = "gqi_sk_test_..."   # create one at gqilabs.com/account ($30/mo free, no card)
H    = {"Authorization": f"Bearer {KEY}"}

# -- 1. data: the demo's live sample (x86 assembly -> measured CPU cycles), or bring your own --
d = requests.get(f"{BASE}/v1/examples", params={"n_train": 256}, headers=H).json()
X_train, y_train = d["X_train"], d["y_train"]
X_test,  y_test  = d["X_test"],  d["y_test"]
print(f"{d['name']}: {len(X_train)} train / {len(X_test)} test — {d['target_meaning']}")

def wait(job_id):
    while True:
        j = requests.get(f"{BASE}/v1/jobs/{job_id}", headers=H).json()
        if j["status"] == "done":  return j["result"]
        if j["status"] == "error": raise RuntimeError(j["error"])
        time.sleep(3)

def evaluate(adapter_id=None):
    job = requests.post(f"{BASE}/v1/eval_async", headers=H, json={
        "X": X_test, "y_true": y_test, "adapter_id": adapter_id, "num_samples": 16}).json()
    m = wait(job["job_id"])["median"]
    return m["spearman"], m["mae"]

# -- 2. score the base model, zero-shot --
rho0, mae0 = evaluate()
print(f"base       spearman {rho0:.3f}   mae {mae0:.2f}")

# -- 3. fine-tune on the train rows (async, a few minutes), with an explicit config --
fit = requests.post(f"{BASE}/v1/fit", headers=H, json={
    "X": X_train, "y": y_train,
    "config": {"max_steps": 400, "lr": 5e-5, "lora_r": 8}}).json()
wait(fit["job_id"])

# -- 4. score the fine-tuned model on the same held-out rows --
rho1, mae1 = evaluate(fit["adapter_id"])
print(f"fine-tuned spearman {rho1:.3f}   mae {mae1:.2f}   (lift {rho1-rho0:+.3f})")

Real output from this exact script (results vary a little run to run):

x86-basicblock-throughput-cycles: 256 train / 128 test — throughput in CPU cycles (lower = faster); ranges from <1 to ~16,000
base       spearman 0.808   mae 1.27
fine-tuned spearman 0.927   mae 0.58   (lift +0.119)

To use your own data instead, replace step 1 with any text column + number column (a pandas DataFrame works directly):

import pandas as pd
df = pd.read_csv("your_data.csv")                     # columns: text, target
split = int(len(df) * 0.8)
X_train, y_train = df.text[:split].tolist(), df.target[:split].astype(float).tolist()
X_test,  y_test  = df.text[split:].tolist(), df.target[split:].astype(float).tolist()

What to expect

The first call after idle pays a GPU cold start (~1 min); the whole script is a few minutes end to end. config keys and defaults are in the API reference — everything is optional. Prefer a browser? The same flow with progress bars is the live demo.

Quickstart with the gqi package

The Python package wraps the same engine in a scikit-learn estimator:

from gqi import GQI

reg = GQI(method="lora", model="small")      # local + free, on your hardware
reg.fit(
    ["cannot log in, urgent!", "typo on the about page", "server down in prod!!"],
    [9.0, 2.0, 10.0],                         # the priority you'd give each (1–10)
)
y_hat, y_std = reg.predict(["payment failing for all users"], return_std=True)
print(y_hat, y_std)   # -> [8.6] [1.1]   priority ≈ 8.6, give or take ~1.1

Not on PyPI yet

The gqi package is pre-release — it isn't publicly installable yet. Until it ships, the playground above and the hosted API are the supported ways in; the package guide previews the interface you'll get.

Why reach for GQI (instead of asking an LLM for a number)

  1. You can trust the confidence. predict(return_std=True) returns the number and a meaningful estimate of how sure it is — so you can auto-accept confident answers and route the unsure ones to a human. A general LLM hands you a number with no honest error bar.
  2. A familiar interface. A clean .fit() / .predict() surface — the same contract you already use with scikit-learn.
  3. Fast, cheap tuning. Each dataset becomes its own lightweight add-on on top of a shared base model, so adapting GQI to your data is quick and inexpensive.

Plain-English glossary

Term What it means
Calibrated The confidence range is trustworthy — when GQI says it's unsure, it really is. That's what lets you threshold on confidence.
fit / predict fit teaches GQI from your labeled examples; predict uses it on new text. (Same as scikit-learn.)
Tuned model (adapter) What fit produces — a small add-on that remembers your data, on top of a shared base model. Also called a LoRA adapter.
Fine-tune Training that small add-on on your examples — what fit does.
Base model The shared, pre-trained model you can use with no fitting at all — a good zero-setup baseline.
y_std / uncertainty The ± confidence range on a prediction (a standard deviation). Bigger = less sure.
Token Roughly a word-piece of input text (~4 characters). Hosted billing counts input tokens; the output is one number, so it's free.

Where to go next

  • Overview — what GQI is for, and why calibrated uncertainty is the differentiator.
  • How it works — the mental model: one shared base model + lightweight per-dataset adapters.
  • Python package gqi — the sklearn-style estimator; free and local, hosted with an API key.
  • Hosted API — REST endpoints for keys, fitting, and predicting at scale.