Because this is about ongoing research, I cannot reveal the exact case of using those statistical methods, but lets explain simply – we have an important dimension in the dataset which can be characterized as a powerlaw-distribution (long tail (at right) and short peak in the beginning (at left)). Because I am still learning statistics, which can be useful more than once, I want to write it down what I learned.
Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator seeks to ascertain the causal effect of one variable upon another—the effect of a price increase upon demand, for example, or the effect of changes in the money supply upon the inflation rate. To explore such issues, the investigator assembles data on the underlying variables of interest and employs regression to estimate the quantitative effect of the causal variables upon the variable that they in influence. The investigator also typically assesses the “statistical significance” of the estimated relationships, that is, the degree of confidence that the true relationship is close to the estimated relationship.
Double logarithmic transformation – it is a log ( log ( x ) ) transformation. You can read here: http://stats.stackexchange.com/questions/298/in-linear-regression-when-is-it-appropriate-to-use-the-log-of-an-independent-va – to find out when it is appropriate to use logarithmic transformation instead of the actual values.
Normal distribution – the normal distribution is immensely useful because of the central limit theorem, which states that, under mild conditions, the mean of many random variables independently drawn from the same distribution is distributed approximately normally, irrespective of the form of the original distribution: physical quantities that are expected to be the sum of many independent processes (such as measurement errors) often have a distribution very close to the normal. Moreover, many results and methods (such as propagation of uncertainty and least squares parameter fitting) can be derived analytically in explicit form when the relevant variables are normally distributed.
Linear regression fit
Nonlinear LOESS regression fit – loess stands for locally estimated scatter-plot smoothing (lowess stands for locally weighted scatter-plot smoothing) and is one of many non-parametric regression techniques, but arguably the most flexible.