PmWikiFisica | Mujpy / Gradients browse

< Mujpy._add_multirun_ | Index | Mujpy.grad >

Why [only] in global fits

Typical global fits, at the time of writing, are multi run (say 10 runs) fit with a couple of global parameters and, say, 5 local parametes. An example is a longitudinal geometry GPS fit in transverse field at 200 mT, one grouping, 5000 bins (20000 rebinned 4 times) with three Gaussian components, proxy for an asymmetric superconductor flux-line-lattice static broadening plus cryostat muons.

This fit requires 60 Minuit parameters and is performed by mujpy in circa 3 minutes on an 8-th generation i7-8550U (with numpy using a single core), starting from a rougher guess and 30 s, starting from sequential best fits. It is convenient to run a sequential fit, extract the guess form there and produce a global fit.

In the end the following analysis proves wrong. Leave it for future memory.

Profiling shows that 6% of the time is spent calculating the plain fit functions (mg from class mujpy.mucomponent.mucomponent mumodel), 8% calculating iminuit hesse covariance matrix at the minimum, and 86% calculating again mg for the numerical evaluation of the gradient. Since Newton's stepest descent calculates gradients and functions with (very roughly) the same frequency, and the numeric assessment of the former consists of as many function calls as 2 times the number of parameters, the time overhead for the numerical gradient is not a surprise

Therefore the strategy of computing analytic gradients should provide a gain by roughly a factor 10 (it does not!).

The analytic grad strategy described below takes slightly longer time and is much more sensitive to the guess distance-to-minimum. This statement comes after extensive debugging, when the two strategies finally get exactly to the same result.

How is the analytic gradient implemented

The next page describes the analytic calculation. In short all components of the present library allow analytic calculations by means of the component itself, and of its derivative with respect to its argument (see next page for examples). Minuit passes an array of parameter values (say 52), representing a point in the domain of the fit cost function. Since asymmetries and errors are stored as 2d numpy arrays of shape (runs,time) both functions (components and their derivatives) can be stored for the entire run x time set of points (calculated only once for each domain point). Then it is just a matter of book-keeping to reassemble the gradient of the cost function as a sum over runs and time bins.

One detail is however crucial: the user can provide user functions func, written as strings, that define a component parameter in terms of either other component parameters (shared parameter) or a global parameter (or both). Functions of up to three minuit parameters are envisaged. These functions contribute a factor to the gradient. For standard best fits the func calculation must be encoded in a python method just before the iminuit invocation. Then a book-keeping algorithm calculates par = fun(p) prior to its component calculation, e.g mg(t,*par), in order to provide the correct parameter set par for that component.

Likewise, for gradient we must compute the partial derivatives of the same user function with respect some minuit parameters. Thus partial derivatives od func must be encoded into python methods and passed to the gradient method with a [modified] book-keeping algorithm. Partial derivatives of a user written function are calculated by the following strategy based on sympy:

the func string is translated by func.replace('this','that'), substituting x,y,z to e.g. p[0],p[5],p[12]
different elementary sympy function names are also translated (e.g. abs to Abs and arctan to atan)
sympy symbol definition
string to expression conversion by funct=sympyfiy(func)
partial y-derivative as dfunct= diff(funct,y), etc.
back to string as str(dfunct)�
translate back e.g. x,y,z to p[0],p[5],p[12] and p[0],p[5],p[12] and abs to Abs, arctan to atan
from here the algorithm is the same as for the original func

iMinuit examples are described here a bit obsolete (Minuit key seems seems grad=... not grad_fcn=...) Description is

    grad_fcn: Optional. Provide a function that calculates the
              gradient analytically and returns an iterable object with one
              element for each dimension. If None is given minuit will
              calculate the gradient numerically. (Default None)

Info in Minuit User Guide

    4.1.3 FCN function with gradient
          By default first derivatives are calculated numerically by M INUIT . In case the user
          wants to supply his own gradient calculator (e.g. analytical derivatives), he needs to
          implement the FCNGradientBase interface.
          The size of the output vector is the same as of the input one. The same is true for
          the position of the elements (first derivative of the function with respect to the n th
          variable has index n in the output vector).

< Mujpy._add_multirun_ | Index | Mujpy.grad >