Applies one online gradient step to the globally learned weights.
Δweight = learningRate × reward × contribution
Scorers with high contribution to a good outcome get heavier; scorers
that pushed a bad decision get lighter. The update is idempotent with
respect to sign: a series of failures will keep driving a weight toward
WEIGHT_MIN but can never push it below that floor.
Applies one online gradient step to the globally learned weights.
Scorers with high contribution to a good outcome get heavier; scorers that pushed a bad decision get lighter. The update is idempotent with respect to sign: a series of failures will keep driving a weight toward
WEIGHT_MINbut can never push it below that floor.NaN and zero rewards are ignored (no-ops).