Recent Changes · Search:

Dispense


Navigation Tips - Dritte


PmWiki

pmwiki.org

edit SideBar

Grad

< The logic behind mujpy use of the gradient in iminuit global fits | Index | Mujpy.Profiling >


Three strategies to speed up mujpy's minuit fit

  • Show how may cores in your machine by bash: cat /proc/cpuinfo | grep processor | wc -l. Multi core numpy should be automatic when np.show_config() indicates that OpenBlas is in use. Check by running top on a multirun global fit. It may indicate 100%, menaing that only 1 CPU is used. This link says that half the cores should be left for the OS, otherwise runtime gets much longer. Maybe this is implemented in the new OpenBlas. It seems everything is handled automatically.
  • Use jax.grad (?)
  • Add analytic gradients (?)

Minuit is presently finding the Best Fit Functions (BFF) {$y_k(t,p)$} one per run {$k$} by chi square minimization over one data set per run

{$$\chi^2 = \sum_{k\in \mathrm{run},i} \left(\frac{y_k(t_i,p)-y_{e,k,i}}{e_{k,i}}\right)^2$$} Each BFF itself is a sum of components

{$$y_k(t,p) = \sum_n y_{k,n}(t,p)$$}

The gradient is

{$$\frac{\partial \chi^2}{\partial p_m} = \sum_{k \in \mathrm{runs}}\sum_i \frac {2\left[\sum_n y_{k,n}(t_i,P)-y_{e,k,i}\right]}{e_{k,i}^2}\, \sum_n \frac {\partial y_{k,n}(t_i,P)}{\partial p_m}$$}

In turns the gradient of the BFF is the sum of the gradients of the components. Component parameters may directly be minuit parameters. Component parameters can also be functions of other parameters, e.g. user parameters, as defined by the "function" key of the corresponding pardict dictionary. In this case let {$P$} be the component parameters and {$P_\alpha$} the one that depends on {$p_i$}

{$$\frac {\partial y_k}{\partial p_m} = \frac {\partial y_k(t,P)}{\partial P_j} \frac {dP_j}{dp_m}$$}

The first derivative on the right is straightforward and it is computed in the table below as {$$\frac {\partial y_k(t,P)}{\partial P_\alpha}$$} The second derivative depends on the user-defined "function" {$P_\alpha(p_i)$} and the derivatives are implemented by sympy.

The gradient calculation in Minuit requires that mumodel knows for each minuit parameter {$p_i$} which component parameter {$P_\alpha$} refers to it. Each of the two partial derivatives depends on {$k,n,j,i$} i.e. it is a rank-4 tensor. Fortunately the second tensor is sparse and it is convenient to calculate just the non zero components.

Let's compute the advantage of sparse calculations. Leaving initially aside the time bins [$i$] (typically 25000), let's assume 11 runs, with 3 components each with 4 parameters. The non global fit requires 12*132 parameters. With 5 global parameters and 5 local parameters the global fit requires only 60 parameters, a great statistical reduction. A dense computation of gradients would require then 60 grad components, $m$, each with 132 parameter coordinates $k,n,j$, for a total of 21120*25000 = 5.28 {$10^8$} calculations. However there are only 155 non-vanishing terms {$\frac {dP_j}{dp_m}$}, leading to just 3.875 {$10^6$} calculations with a factor 135.

For this reason aux method int2_multirun_grad_method_key identifies each and every set of indices {$(k,n,j)$} for which {$\frac {dP_j}{dp_m}\ne 0$}. This is accomplished by the same aux method that calculates the derivative of the user func, namely diffunc, which also extracts the indices (e.g. p[1]*p[6] produces the methods for computing the two derivatives and the two indices, {$m$} = 1 and 6).

Mind that the model is defined on a single run and the user funcs refer to parameters by their dashboard indices. Each run requires a translation of these indices into those of the full minuit internal parameter list. Global parameters retain their original indices and local parameters increment theirs at every run. This is performed by translate_multirun.

Each component method in mucomoponents class mumodel has a two-letter name, e.g. ba. Each defines as many methods as its parameters, e.g. 3 for ba, _grad_ba_0_, _grad_ba_1_ and _grad_ba_2_, corresponding each to the derivative with respect to one parameter. The calculations are performed according to the following table

Component

Function

Grad

al*

{$$ y_{e,j} = \frac {N_{F,j}-\alpha N_{B.j}}{N_{F,j}+\alpha N_{B.j}}$$}

{$$ \frac{\partial y_{e,j}}{\partial \alpha} = \frac {-2N_{B,j}N_{F,j}}{(N_{F,j}+\alpha N_{B.j})^2}$$}

bl

{$$ y(t,A,\lambda) = A \exp(-\lambda t)$$}

{$$ \frac {\partial y(t,A,\lambda)}{\partial A} = \frac {y(t,A,\lambda)} A $$}

{$$ \frac {\partial y(t,A,\lambda)}{\partial\lambda} = - t y(t,A,\lambda)$$}

bg

{$$ y(t,A,\sigma) = A \exp(-\frac 1 2 \sigma^2 t^2)$$}

{$$ \frac {\partial y(t,A,\sigma)}{\partial A} = \frac {y(t,A,\sigma)} A $$}

{$$ \frac {\partial y(t,A,\sigma)}{\partial\sigma} = - \sigma t^2 y(t,A,\sigma)$$}

bs

{$$ y(t,A,\Lambda,\beta) = A \exp(-(\Lambda t)^\beta)$$}

{$$ \frac {\partial y(t,A,\Lambda,\beta)}{\partial A} = \frac {y(t,A,\Lambda,\beta)} A $$}

{$$ \frac {\partial y(t,A,\Lambda,\beta)}{\partial\Lambda} = -\frac \beta \Lambda (\Lambda t)^\beta y(t,A,\Lambda,\beta)$$}

{$$ \frac {\partial y(t,A,\Lambda,\beta)}{\partial\beta} = -\ln(\Lambda t) (\Lambda t)^{\beta} y(t,A,\Lambda,\beta)$$}

ba

{$$ y(t,A,\lambda,\sigma) = A \exp(-\frac 1 2 \sigma^2 t^2)\exp(-\lambda t)$$}

{$$ \frac {\partial y(t,A,\lambda,\sigma)}{\partial A} = \frac {y(t,A,\lambda,\sigma)} A $$}

{$$ \frac {\partial y(t,A,\lambda,\sigma)}{\partial\lambda} = - t y(t,A,\lambda)$$}

{$$ \frac {\partial y(t,A,\lambda,\sigma)}{\partial\sigma} = - \sigma t^2 y(t,A,\lambda,\sigma)$$}

ml**

{$$ y(t,A,B,\phi,\lambda) = A \cos(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)$$}

{$\pi,\gamma_\mu$} predefined constants

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial A} = \frac {y(t,A,B,\phi,\lambda)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial B} = - A\,2\pi\gamma_\mu t \sin(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t) $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial\phi} = - A\sin(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial\lambda} = - t y(t,A,B,\phi,\lambda)$$}

mg**

{$$ y(t,A,B,\phi,\sigma) = A \cos(2\pi\gamma_\mu B t + \phi)\exp(-\frac 1 2 \sigma^2 t^2)$$}

{$\pi,\gamma_\mu$} predefined constants

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial A} = \frac {y(t,A,B,\phi,\sigma)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial B} = - A\,2\pi\gamma_\mu t \sin(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t) $$}

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial\phi} = - A\sin(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)$$}

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial\sigma} = - \sigma t^2 y(t,A,B,\phi,\sigma)$$}

ms**

{$$ y(t,A,B,\phi,\Lambda,\beta) = A \cos(2\pi\gamma_\mu B t + \phi)\exp(- (\Lambda t)^\beta)$$}

{$\pi,\gamma_\mu$} predefined constants

{$$ \frac {\partial y(t,A,B,\phi,\Lambda,\beta)}{\partial A} = \frac {y(t,A,B,\phi,\Lambda,\beta)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\Lambda,\beta)}{\partial B} = - A\,2\pi\gamma_\mu t \sin(2\pi\gamma_\mu B t + \phi)\exp(-(\Lambda t)^\beta) $$}

{$$ \frac {\partial y(t,A,B,\phi,\Lambda,\beta)}{\partial\phi} = - A\sin(2\pi\gamma_\mu B t + \phi)\exp(-(\Lambda t)^\beta)$$}

{$$ \frac {\partial y(t,A,B,\phi,\Lambda,\beta)}{\partial\Lambda} = - \frac \beta \Lambda (\Lambda t)^\beta y(t,A,B,\phi,\Lambda,\beta)$$}

{$$ \frac {\partial y(t,A,B,\phi,\Lambda,\beta)}{\partial\beta} = - \log(\Lambda t) (\Lambda t)^\beta y(t,A,B,\phi,\Lambda,\beta)$$}

mu**

{$$ y(t,A,B,\phi,\lambda,\sigma) = A \cos(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)\exp(-\frac 1 2 \sigma^2 t^2)$$}

{$\pi,\gamma_\mu$} predefined constants

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\sigma)}{\partial A} = \frac {y(t,A,B,\phi,\lambda,\sigma)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\sigma)}{\partial B} = - A\,2\pi\gamma_\mu t \sin(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t) \exp(-\frac 1 2 \sigma^2 t^2)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\sigma)}{\partial\phi} = - A\sin(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)\exp(-\frac 1 2 \sigma^2 t^2)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\sigma)}{\partial\lambda} = - t y(t,A,B,\phi,\lambda,\sigma)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\sigma)}{\partial\sigma} = - \sigma t^2 y(t,A,B,\phi,\lambda,\sigma)$$}

jl**

{$$ y(t,A,B,\phi,\lambda) = A J_0(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)$$}

{$\pi,\gamma_\mu$} predefined constants, {$J_\nu(0)$} Bessel function of the first kind

{$$\frac{dJ_0(z)}{dz} = -J_1(z)$$} see here

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial A} = \frac {y(t,A,B,\phi,\lambda)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial B} = - A\,2\pi\gamma_\mu t J_1(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t) $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial\phi} = - A J_1(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial\lambda} = - t y(t,A,B,\phi,\lambda)$$}

jg**

{$$ y(t,A,B,\phi,\sigma) = A J_0(2\pi\gamma_\mu B t + \phi)\exp(-\frac 1 2 \sigma^2 t^2)$$}

{$\pi,\gamma_\mu$} predefined constants

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial A} = \frac {y(t,A,B,\phi,\sigma)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial B} = - A\,2\pi\gamma_\mu t J_1(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t) $$}

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial\phi} = - A J_1(2\pi\gamma_\mu B t + \phi)\exp(-\lambda t)$$}

{$$ \frac {\partial y(t,A,B,\phi,\sigma)}{\partial\sigma} = - \sigma t^2 y(t,A,B,\phi,\sigma)$$}

js**

{$$ y(t,A,B,\phi,\lambda,\beta) = A J_0(2\pi\gamma_\mu B t + \phi)\exp(- (\lambda t)^\beta)$$}

{$\pi,\gamma_\mu$} predefined constants

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\beta)}{\partial A} = \frac {y(t,A,B,\phi,\lambda,\beta)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\beta)}{\partial B} = - A\,2\pi\gamma_\mu t J_1(2\pi\gamma_\mu B t + \phi)\exp(-(\lambda t)^\beta) $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\beta)}{\partial\phi} = - A J_1(2\pi\gamma_\mu B t + \phi)\exp(-(\lambda t)^\beta)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\beta)}{\partial\lambda} = - \frac \beta \lambda (\lambda t)^\beta y(t,A,B,\phi,\lambda,\beta)$$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda,\beta)}{\partial\beta} = - \log(\lambda t) (\lambda t)^\beta y(t,A,B,\phi,\lambda,\beta)$$}

fm**

{$$\begin{align*} y(t,A,B,\lambda) &= \frac A 6 \left[ 1 + \cos(2\omega_d t)+ 2\cos(\omega_d t)\right.\\ & + \left. 2\cos(3\omega_d t \right]\exp(-\lambda t)\end{align*}$$} {$$\omega_d = \pi\gamma_\mu B$$}

{$$ \frac {\partial y(t,A,B,\lambda)}{\partial A} = \frac {y(t,A,B,\lambda)} A $$}

{$$ \frac {\partial y(t,A,B,\phi,\lambda)}{\partial B} = - A\,2\pi\gamma_\mu [\sin(2\omega_d t) + \sin(\omega_d t) + 3\sin(3\omega_d t)]\exp(-\lambda t) $$}

kg

see here for derivative of Faddeeva

kl

kd

ks

al* can be removed (never consider a global alpha fit!)


y** require two grad functions, y(t,par) and its derivative

Why is it convenient to do analytic gradients Assume 8 detectors, 10 runs, 50000 bins, 2 components each for HAL. Such typical global fit has 57 parameters

With the analytic option one stores intermediate results (each component in the model needs to store 8x10x50000 x 8 bytes = 32 MB) Two components, storing both functions and derivatives, need 4x, i.e. 128 MB. This corresponds to a certain amount of calculations. that can be then retireved for much faster sums and multiplications.

The alternative numerical strategy consists in calculations equivalent to 2x32MB of data per numerical parameter derivative (the component value at {$p\pm dp$}). With 60 parameters one presently performs calculation equivalent to 60x2x32MB i.e 60 times as many caclulations.


< The logic behind mujpy use of the gradient in iminuit global fits | Index | Mujpy.Profiling >

Edit - History - Print - PDF - Recent Changes - Search
Page last modified on January 15, 2023, at 12:18 PM