Last two weeks, I mostly spent on learning ROOT tools and exploring ideas for showcasing VarTransform method by visual plots. One of the explored ideas that appealed me the most is Histogram i.e. displaying the variances of variables on a histogram and show the selected and rejected variables. Below is the kind of plot I had in mind:

To show this variance histogram, one needs to know the variance of each variable which is not directly accessible in TMVA because it is calculated internally in `VarTransform()`

method. Hence, to give more liberty to user, I tweaked TMVA a bit.

This is how my thought process went:

Currently variance of each variable is only calculated in `VarTransform`

method and it is not stored anywhere. In TMVA, there is a class `VariableInfo`

which stores all the necessary information regarding variables. One can set min, max, mean and RMS of each variable and get it anywhere when needed from this class. It seemed perfect place to me to add a set of new methods `SetVariance()`

and `GetVariance()`

. It already has these Set and Get methods for other norm parameters. After adding these methods I changed my `VarTransform`

method to set variance of each variable after calculation. But, I was still not able to access variance of each variable because `DefaultDataSetInfo()`

(a method in DataLoader class) is *private*. Since user should be able to get all the necessary details about dataset internally calculated by TMVA, I added a method `GetDataSetInfo()`

to DataLoader class which returns a `DataSetInfo`

object. After making these two changes, I was able to access variance of each variable.

But there is still a issue that needs to be handled. Variance of each variable is only set when `VarTransform`

method is called. Ideally user would first like to know the variance of each variable and might want to analyse the dataset by plots before specifying the threshold for selecting variables. Hence to calculate and set norm parameters of variables like mean, variance etc. I created a new method `CalcNorm()`

and called this method from `VarTranform()`

method whenever necessary. Now user can call this `CalcNorm()`

method directly to get an idea of mean, variance etc. of each variable. `VarTransform()`

can be applied later if selection of some variables is needed. I also added functionality to calculate variance for variables and targets in `TMVA::VariableTransformBase::CalcNorm()`

method.

In addition to above, I wanted to print the output of `CalcNorm()`

in tabular form so that it looks neat and readable. Due to different lengths of variable expressions, printing to `std::out`

was caused in haphazard manner. To achieve symmetry in table, I needed maximum length of variable and target name for which I added two new methods in DataSetInfo class `GetVariableNameMaxLength()`

and `GetTargetNameMaxLength()`

, which already had a similar method to get maximum length of class name (`GetClassNameMaxLength()`

).

Following diagram sums up the changes:

All these changes can be viewed in the commit history of this new branch I created: get-variance. This notebook demonstrates the above updates.