How it works¶
GQI is built around one idea: a single, general base model that reads unstructured input, plus a lightweight adapter that specializes it to your data. You never train a model from scratch and you never manage a separate model per use case — you fit a small adapter on top of a shared base, and the base does the heavy lifting.
From input to calibrated number¶
Every prediction follows the same path. You fit on examples of your input paired with the numbers you want, GQI produces an adapter tuned to your data, and at predict time that adapter turns new input into a calibrated number with an uncertainty estimate.
examples: X (your text) + y (numbers) new input: X_new
│ │
▼ ▼
┌─────────┐ adapter ┌───────────┐
│ fit() │ ──────────────────────►│ predict() │
└─────────┘ └───────────┘
│ │
shared base model ◄──────────────────────-┘
▼
calibrated number + uncertainty
(y_hat, y_std)
The base model is general — it already knows how to read text and emit numbers. Fitting only learns the small amount that is specific to your data, which is why adaptation is fast and inexpensive.
One base model, many lightweight adapters¶
Instead of a dedicated model per customer or per dataset, GQI keeps one shared base and attaches a small adapter for each dataset you fit. Adapters are a few megabytes, train in minutes, and load on top of the same base.
This is what makes adaptation cheap:
- Fast to create — fitting tunes only the adapter, not the whole model.
- Cheap to store — an adapter is small, so you can have many of them.
- Cheap to serve — many adapters share one base, so they don't each need their own dedicated machine.
The method you choose controls how the adapter is built:
method |
What it does | Best for |
|---|---|---|
few_shot |
Stores your examples and conditions on them at predict time — no training step | Quick experiments, very small datasets |
lora |
Trains a small low-rank adapter on top of the base | The usual choice: fast, light, accurate |
full |
Fine-tunes the full model for the dataset | Maximum accuracy when you have enough data |
Calibrated probabilistic outputs¶
GQI does not just emit a point estimate. It produces a distribution over the
answer, which is what lets predict(return_std=True) return a meaningful
uncertainty alongside the number. Calibration means those error bars are
trustworthy: when GQI reports low confidence, it really is less sure. That makes
the output safe to act on — you can threshold on confidence, escalate uncertain
cases, or carry the uncertainty into whatever you do next.
The same model, local or hosted¶
The base model and the .fit() / .predict() contract are identical whether
you run GQI on your own hardware or in GQI Cloud:
- Local runs a compact open-weights model in-process on your CPU or GPU. It's free, fully under your control, and ideal for prototyping.
- Hosted runs a larger managed model in GQI Cloud, with adapters created and served for you, so you can scale without operating any infrastructure.
Because both paths share the same interface, moving from local prototyping to
hosted production is a one-line import change — the way you call
GQI does not change, only where fit and predict execute.