Change Standardization

Author

Changkai MAI

Published

December 12, 2025

Theory

Idea

Current Problem: truncation error at the boundary is huge.

Proposed Solution: map the data samples back to \(\mathbb{R}\)

Proposed workflow: original data samples \(\to [0, 1] \to \mathbb{R}\)

Essentials

Say, \(g: x \to y\) is an one-to-one and monotonic projection. \(p_X\) is the probablity distribution of variable \(x\), and \(p_Y\) for \(y\).

We have:

\[ F_X(x) = \mathbb{P}\{X \leq x\} = \mathbb{P}\{g(X) \leq g(x)\} = \mathbb{P}\{Y \leq g(x)\} = F_Y(g(x)) \]

Hence:

\[ p_X(x) = \frac{d}{dx} F_X(x) = \frac{d}{dx}F_Y(g(x)) = p_Y(g(x)) \cdot g'(x) \]

Also:

\[ \int_a^b p_X(x) dx = \int_a^b p_Y(g(x)) \cdot g'(x) dx = \int_{g(a)}^{g(b)} p_Y(y) dy \]

Extend to higher dimension:

\[ \int_A p_X(x) dx = \int_{g(A)} p_Y(y) dy \]

The conclusion suggests that any monotonic (order preserving, inversible, etc.) projection can be used as standardization method.

Application

I choose logit function as \(g\):

\[ g(x) = \log \frac{x}{1 - x} \]

Discussion on implementation

For numeric stability, I cannot use the full range \([0, 1]\), thus it should be clamp to \([\epsilon, 1 - \epsilon]\).

For a reasonable range, \(\text{logit}(10^{-4}) \approx -9.2\). Thus I use \(\epsilon = 10^{-4}\).

Also, I don’t want range length to be compressed too small, thus I add a temperature \(\tau\):

\[ g(x) = \tau \log \frac{x}{1 - x} \]

And I want (\(\rho\) denotes a range \([a, b]\)):

\[ \frac{|g(\rho)|}{|\rho|} = \frac{g(b) - g(a)}{b - a} \ge \text{Constant} \]

A sufficient condition is (using Lagrange’s Mean Value Theorem):

\[ \forall x \in (0, 1), \,\, g'(x) \ge \text{Constant} \]

Let constant be 1, we have:

\[ \forall x \in (0, 1), \,\, \frac{\tau}{x(1 - x)} \ge 1 \]

This yields that \(\tau \ge 0.25\). I choose \(\tau = 2.5\) for better resolution.

Experiment

Training a model with Z-score Standardization:

config = {
    "train": {
        "epochs": 500,
        "lr": 0.01,
        "samples": None,
        "naive_dequantization": True,
        "standardization": "z-score", # Alt: "logit"
    },
    "finetune": {
        "activate": False,
        "samples": 1000,
        "epochs": 50,
        "lr": 0.01,
    },
    "eval": {
        "activate": True,
        "samples": 1000,
    },
    "partitions": 5,
}

Generating evaluation queries...
Preprocessing data...
Using naive dequantization...
Applying z-score standardization...
Training model...

Evaluating model...
GMQ: 3.067032402719737;
50%: 2.2408485495027604;
90%: 2.6734368967932394;
95%: 253.07973394462869;
MAX: 564.9195758452238

Training a model with Logit Standardization:

config = {
    "train": {
        "epochs": 500,
        "lr": 0.01,
        "samples": None,
        "naive_dequantization": True,
        "standardization": "logit", # Alt: "z-score"
    },
    "finetune": {
        "activate": False,
        "samples": 1000,
        "epochs": 50,
        "lr": 0.01,
    },
    "eval": {
        "activate": True,
        "samples": 1000,
    },
    "partitions": 5,
}

Generating evaluation queries...
Preprocessing data...
Using naive dequantization...
Applying logit transformation...
Training model...

Evaluating model...
GMQ: 1.1305918793921608;
50%: 1.1187701929634826;
90%: 1.2575532381515169;
95%: 1.3295269505105132;
MAX: 1.7030936231298184

Change Standardization

Theory

Idea

Essentials

Application

Discussion on implementation

Experiment

Training a model with Z-score Standardization:

Training a model with Logit Standardization:

Visualization (marginal)

Column 0

Column 1

Column 2

Column 3

Column 4

Column 5

Column 6