How it works¶

GQI is built around one idea: a single, general base model that reads unstructured input, plus a lightweight adapter that specializes it to your data. You never train a model from scratch and you never manage a separate model per use case — you fit a small adapter on top of a shared base, and the base does the heavy lifting.

From input to calibrated number¶

Every prediction follows the same path. You fit on examples of your input paired with the numbers you want, GQI produces an adapter tuned to your data, and at predict time that adapter turns new input into a calibrated number with an uncertainty estimate.

   examples:  X (your text) + y (numbers)          new input:  X_new
                      │                                  │
                      ▼                                  ▼
                 ┌─────────┐         adapter        ┌───────────┐
                 │  fit()  │ ──────────────────────►│ predict() │
                 └─────────┘                        └───────────┘
                      │                                  │
              shared base model ◄──────────────────────-┘
                                                         ▼
                                  calibrated number + uncertainty
                                          (y_hat, y_std)

The base model is general — it already knows how to read text and emit numbers. Fitting only learns the small amount that is specific to your data, which is why adaptation is fast and inexpensive.

One base model, many lightweight adapters¶

Instead of a dedicated model per customer or per dataset, GQI keeps one shared base and attaches a small adapter for each dataset you fit. Adapters are a few megabytes, train in minutes, and load on top of the same base.

This is what makes adaptation cheap:

Fast to create — fitting tunes only the adapter, not the whole model.
Cheap to store — an adapter is small, so you can have many of them.
Cheap to serve — many adapters share one base, so they don't each need their own dedicated machine.

The method you choose controls how the adapter is built:

`method`	What it does	Best for
`few_shot`	Stores your examples and conditions on them at predict time — no training step	Quick experiments, very small datasets
`lora`	Trains a small low-rank adapter on top of the base	The usual choice: fast, light, accurate
`full`	Fine-tunes the full model for the dataset	Maximum accuracy when you have enough data

Calibrated probabilistic outputs¶

GQI does not just emit a point estimate. It produces a distribution over the answer, which is what lets predict(return_std=True) return a meaningful uncertainty alongside the number. Calibration means those error bars are trustworthy: when GQI reports low confidence, it really is less sure. That makes the output safe to act on — you can threshold on confidence, escalate uncertain cases, or carry the uncertainty into whatever you do next.

The same model, local or hosted¶

The base model and the .fit() / .predict() contract are identical whether you run GQI on your own hardware or in GQI Cloud:

Local runs a compact open-weights model in-process on your CPU or GPU. It's free, fully under your control, and ideal for prototyping.
Hosted runs a larger managed model in GQI Cloud, with adapters created and served for you, so you can scale without operating any infrastructure.

Because both paths share the same interface, moving from local prototyping to hosted production is a one-line import change — the way you call GQI does not change, only where fit and predict execute.