Hosted API¶

The GQI hosted API serves the managed models and runs per-dataset fine-tuning for you. It's what the gqi package talks to in hosted mode, and what you call directly from any language.

Base URL: https://gqilabs.com/api

Test mode

The API is currently in test mode — payments use Stripe test cards and no real charges are made. Sign in at gqilabs.com to get a key and start experimenting.

Principles¶

A few decisions shape the API — worth knowing up front:

Your data stays yours. Each fine-tune trains on the rows you send in that request and keeps only the resulting adapter — we don't store your training data.
Models are private to your key. A tuned model is (your key, its name); different keys are fully isolated from each other.
Predictable, not magic. Re-fitting a name replaces it (no hidden versioning), and a fine-tuned model is live the instant its job finishes — no deploy step.
You pay for input, not output. Billing is on input tokens; the output is a single calibrated number, so it's free.

Authentication¶

Every request carries a Bearer API key. Create and manage keys in your account console — key management is tied to your signed-in browser session, so keys can't be minted with a bare API call. The full key is shown once, at creation:

gqi_sk_test_1a2b3c...   # copy it when it's shown — only the prefix is stored

Send it on every call:

curl https://gqilabs.com/api/v1/models -H "Authorization: Bearer gqi_sk_test_..."

Your key identifies your account and scopes every tuned model and your credit balance to you.

Two kinds of model¶

Base models — shared, pre-trained bases you can use directly or fine-tune on top of. List them with GET /v1/models:

tag notes

t270m_general_stageB GQI Lite — fast (the default)

GQI_1B_v0 GQI Pro — most accurate

GQI_Max_v0 GQI Max — our most capable model (preview)
Tuned models — your own fine-tuned models. Each is a lightweight adapter over a base, named by you and private to your key. The name is your task's identity — reuse it to keep predicting, or to retrain.

The core loop: fit → poll → predict¶

`POST /v1/fit`¶

Starts an asynchronous fine-tune. Pass a model name, your examples X (text) and targets y (numbers), and optionally a base_model. Returns a job handle.

curl -X POST https://gqilabs.com/api/v1/fit \
  -H "Authorization: Bearer gqi_sk_test_..." -H "Content-Type: application/json" \
  -d '{
        "model": "ticket-priority",
        "base_model": "GQI_1B_v0",
        "X": ["cannot log in, urgent!", "typo on the about page"],
        "y": [9.0, 2.0]
      }'
# -> { "job_id": "job_abc123", "model": "ticket-priority", "status": "queued" }

`GET /v1/jobs/{id}`¶

Poll until the fit finishes (status goes queued → running → done).

curl https://gqilabs.com/api/v1/jobs/job_abc123 \
  -H "Authorization: Bearer gqi_sk_test_..."
# -> { "status": "running", "history": [ { "step": 8, "val_loss": 0.74 }, ... ] }

`POST /v1/predict`¶

Predict with your tuned model by name, or with a base_model directly. Every prediction returns y (the calibrated number — a median over samples) and y_std (its calibrated uncertainty — a robust spread, with sampler outliers clipped). num_samples sets how many samples back the estimate (default 32).

# your tuned model
curl -X POST https://gqilabs.com/api/v1/predict \
  -H "Authorization: Bearer gqi_sk_test_..." -H "Content-Type: application/json" \
  -d '{ "model": "ticket-priority", "X": ["server down in prod!!"] }'
# -> { "y": [8.7], "y_std": [1.1], "served": "adapter" }

# a base model, no fine-tune
curl -X POST https://gqilabs.com/api/v1/predict \
  -H "Authorization: Bearer gqi_sk_test_..." -H "Content-Type: application/json" \
  -d '{ "base_model": "GQI_1B_v0", "X": ["..."] }'

`GET /v1/tuned` · `DELETE /v1/tuned/{name}`¶

List or delete your tuned models.

curl https://gqilabs.com/api/v1/tuned -H "Authorization: Bearer gqi_sk_test_..."
# -> { "models": [ { "name": "ticket-priority", "base_model": "GQI_1B_v0",
#                    "status": "ready", "n_examples": 340, "updated": 1719... } ] }

Updating a model with new data¶

Re-run POST /v1/fit with the same model name to retrain it. GQI trains a fresh adapter on exactly the data you send — so to train on more data, send the full (cumulative) set.

We don't store your training data

Each fit trains on the rows in that request and keeps only the resulting adapter (a few MB). You own your data — send it when you want to (re)train. A future continue mode will warm-start from your existing adapter for cheap incremental updates; it isn't available yet.

Design intent, so it's predictable:

A tuned model is (your key, name). Same key + same name → the same model. Different keys are fully isolated.
Re-fit replaces. Fitting a name again trains a new adapter and points the name at it (no silent versioning).
Live immediately. Once a job is done, predict against the name right away — there is no separate deploy step.

Credits & usage¶

Requests meter against prepaid credits ($30 free each month, no card required). Billing is on input tokens — the output is one number, so it's free. See pricing. If you run out, calls return 402; top up in the console.

Method & path	Purpose
`GET /v1/billing/balance`	Your credit balance
`POST /v1/billing/checkout`	Add credits (Stripe) → checkout URL
`GET /v1/usage`	Spend this month + recent activity

Endpoint summary¶

Method & path	Purpose
`POST /v1/keys`	Create an API key (shown once; console session only)
`GET /v1/models`	List base models
`POST /v1/fit`	Start a fine-tune (async) → job handle
`GET /v1/jobs/{id}`	Poll a fit job
`POST /v1/predict`	Predict with a tuned `model` or a `base_model`
`GET /v1/tuned` · `DELETE /v1/tuned/{name}`	List / delete your tuned models
`GET /v1/billing/balance` · `GET /v1/usage` · `POST /v1/billing/checkout`	Credits & usage

tag	notes
`t270m_general_stageB`	GQI Lite — fast (the default)
`GQI_1B_v0`	GQI Pro — most accurate
`GQI_Max_v0`	GQI Max — our most capable model (preview)