scalib.modeling.RLDAClassifier#

class scalib.modeling.RLDAClassifier(nb, p)[source]#

Regression-based Linear Discriminant Analysis.

Models the leakage using a regression-based linear discriminant analysis (RLDA) classifier [1], which can efficiently handle long traces and large number of classes.

In a nutshell, this model performs LDA with the class means modelled as linear regression based on the \(n_b\) bits of the class value. Compared to the scalib.modeling.LDAClassifier, this model will perform better when the number of classes is large and/or there are few profiling traces.

Internally, it first estimates the coefficients of the linear regression, then computes a projection matrix that reduces the dimensionality of the gaussian template to \(p\) dimensions and makes the covariance matrix the identity.

It is then able to predict the leakage likelihood

\[\hat{\mathsf{f}}[\mathbf{l}|X=x] = \alpha \exp\left( -\frac{1}{2} \lVert\mathbf{W}^T\mathbf{l} - \mathbf{A}\mathbf\beta(x)\rVert^2 \right).\]

Where \(\mathbf{W}\) is the projection matrix, \(\mathbf{A}\) the projected regression coefficients, and \(\mathbf{\beta(x)}\) the coefficients of \(x\). The parameter \(\alpha = 1/\sqrt{(2\pi)^p\lvert\hat\Sigma_\mathbf{W}}\rvert\) does not need to be calculated as it will get canceled out when applying Bayes’ law.

RLDAClassifier provides the probability for each of the \(2^{n_b}\) classes with predict_proba().

Examples

>>> from scalib.modeling import RLDAClassifier
>>> import numpy as np
>>> traces_model = np.random.randint(0,256,(5000,10),dtype=np.int16)
>>> labels_model = np.random.randint(0,256,(5000,1),dtype=np.uint64)
>>> rlda = RLDAClassifier(8, 3)
>>> rlda.fit_u(traces_model, labels_model)
>>> rlda.solve()
>>> traces_test = np.random.randint(0,256,(5000,10),dtype=np.int16)
>>> prs = rlda.predict_proba(traces_test, 0)

References

Parameters:

nb (int) – Number of bits of the profiled variables.
nv – Number of variables to profile
p (int) – Number of dimensions in the linear subspace.

Methods

`fit_u`(traces, x[, gemm_mode])	Update statistical model estimates with additional data.
`get_clustered_model`(var, t[, max_clusters, ...])	Generate a simplified model for faster estimation of the information content in this model.
`get_proj`()	Returns the projection matrix.
`get_proj_coefs`()	The projected regression coefficients.
`predict_proba`(traces, var)	Computes the probability for each of the classes for the requested variables.
`solve`()	Solve the RLDA equations.

fit_u(traces, x, gemm_mode=1)[source]#

Update statistical model estimates with additional data.

This can be called multiple times, the state is accumulated.

Parameters:

traces (array_like, int16) – Array that contains the traces. Shape (n,ns).
x (array_like, uint64) – Labels for each trace. Shape (n,nv).

solve()[source]#

Solve the RLDA equations.

Notes

Once this has been called, predictions can be performed.

get_proj()[source]#

Returns the projection matrix.

Returns:: Shape (nv,p,ns).
Return type:: array_like, float64

get_proj_coefs()[source]#

The projected regression coefficients.

Returns:: Shape (nv,p,nb+1).
Return type:: array_like, float64

predict_proba(traces, var)[source]#

Computes the probability for each of the classes for the requested variables.

Parameters:

traces (numpy.typing.NDArray.numpy.int16) – Array that contains the traces. Shape (n,ns).
var (int) – Id (position in the x array) of the variable for which the probabilities are computed.

Returns:

Probabilities. Shape (n, nc).

Return type:

array_like, f64

class ClusteredModel[source]#: Clustered RLDA model, see RLDAClassifier.get_clustered_model().

get_clustered_model(var, t, max_clusters=10000000, store_associated_classes=True)[source]#

Generate a simplified model for faster estimation of the information content in this model.

This generates a model with clustered means that can be used to estimate the percevied or training information of the model. It applies a clustering method on the classes to regroup the closest ones up to a threshold distance \(t\). Internally, it uses a Kd-tree data structure to find the nearest cluster efficiently. Details on the clustering algorithm can be found in [1].

The resulting model can be used with scalib.metrics.RLDAInformationEstimator (see there for usage example).

Parameters:

var (int) – Id (position in the x array) of the variable for which the probabilities are computed.
t (float) – Maximum distance between 2 cluster centers. This is a trade-off parameter between the tightness of the information bounds (lower value of t) and computation (time and memory) efficiency (higher value of t).
max_clusters (int) – The maximum number of clusters that can be generated. If during generation, this limit is exceeded, an exception is raised.
store_associated_classes (bool) – If True, the generated model stores the classes associated to each cluster. This allows refining the information bounds by calculating using the exact class mean (and not the centroid it is associated to) for clusters that contribute the most to an untight bound. Note that this option requires significantly more RAM for high values of \(n_b\).

Returns:

A clustered model to be used in scalib.metrics.RLDAInformationEstimator

Return type:

ClusteredModel