Logo ROOT   6.12/07
Reference Guide
TRobustEstimator.cxx
Go to the documentation of this file.
1 // @(#)root/physics:$Id$
2 // Author: Anna Kreshuk 08/10/2004
3 
4 /*************************************************************************
5  * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6  * All rights reserved. *
7  * *
8  * For the licensing terms see $ROOTSYS/LICENSE. *
9  * For the list of contributors see $ROOTSYS/README/CREDITS. *
10  *************************************************************************/
11 
12 /** \class TRobustEstimator
13  \ingroup Physics
14 Minimum Covariance Determinant Estimator - a Fast Algorithm
15 invented by Peter J.Rousseeuw and Katrien Van Dreissen
16 "A Fast Algorithm for the Minimum covariance Determinant Estimator"
17 Technometrics, August 1999, Vol.41, NO.3
18 
19 What are robust estimators?
20 "An important property of an estimator is its robustness. An estimator
21 is called robust if it is insensitive to measurements that deviate
22 from the expected behaviour. There are 2 ways to treat such deviating
23 measurements: one may either try to recognise them and then remove
24 them from the data sample; or one may leave them in the sample, taking
25 care that they do not influence the estimate unduly. In both cases robust
26 estimators are needed...Robust procedures compensate for systematic errors
27 as much as possible, and indicate any situation in which a danger of not being
28 able to operate reliably is detected."
29 R.Fruhwirth, M.Regler, R.K.Bock, H.Grote, D.Notz
30 "Data Analysis Techniques for High-Energy Physics", 2nd edition
31 
32 What does this algorithm do?
33 It computes a highly robust estimator of multivariate location and scatter.
34 Then, it takes those estimates to compute robust distances of all the
35 data vectors. Those with large robust distances are considered outliers.
36 Robust distances can then be plotted for better visualization of the data.
37 
38 How does this algorithm do it?
39 The MCD objective is to find h observations(out of n) whose classical
40 covariance matrix has the lowest determinant. The MCD estimator of location
41 is then the average of those h points and the MCD estimate of scatter
42 is their covariance matrix. The minimum(and default) h = (n+nvariables+1)/2
43 so the algorithm is effective when less than (n+nvar+1)/2 variables are outliers.
44 The algorithm also allows for exact fit situations - that is, when h or more
45 observations lie on a hyperplane. Then the algorithm still yields the MCD location T
46 and scatter matrix S, the latter being singular as it should be. From (T,S) the
47 program then computes the equation of the hyperplane.
48 
49 How can this algorithm be used?
50 In any case, when contamination of data is suspected, that might influence
51 the classical estimates.
52 Also, robust estimation of location and scatter is a tool to robustify
53 other multivariate techniques such as, for example, principal-component analysis
54 and discriminant analysis.
55 
56 Technical details of the algorithm:
57 
58 1. The default h = (n+nvariables+1)/2, but the user may choose any integer h with
59  (n+nvariables+1)/2<=h<=n. The program then reports the MCD's breakdown value
60  (n-h+1)/n. If you are sure that the dataset contains less than 25% contamination
61  which is usually the case, a good compromise between breakdown value and
62  efficiency is obtained by putting h=[.75*n].
63 2. If h=n,the MCD location estimate is the average of the whole dataset, and
64  the MCD scatter estimate is its covariance matrix. Report this and stop
65 3. If nvariables=1 (univariate data), compute the MCD estimate by the exact
66  algorithm of Rousseeuw and Leroy (1987, pp.171-172) in O(nlogn)time and stop
67 4. From here on, h<n and nvariables>=2.
68  1. If n is small:
69  - repeat (say) 500 times:
70  - construct an initial h-subset, starting from a random (nvar+1)-subset
71  - carry out 2 C-steps (described in the comments of CStep function)
72  - for the 10 results with lowest det(S):
73  - carry out C-steps until convergence
74  - report the solution (T, S) with the lowest det(S)
75  2. If n is larger (say, n>600), then
76  - construct up to 5 disjoint random subsets of size nsub (say, nsub=300)
77  - inside each subset repeat 500/5 times:
78  - construct an initial subset of size hsub=[nsub*h/n]
79  - carry out 2 C-steps
80  - keep the best 10 results (Tsub, Ssub)
81  - pool the subsets, yielding the merged set (say, of size nmerged=1500)
82  - in the merged set, repeat for each of the 50 solutions (Tsub, Ssub)
83  - carry out 2 C-steps
84  - keep the 10 best results
85  - in the full dataset, repeat for those best results:
86  - take several C-steps, using n and h
87  - report the best final result (T, S)
88 5. To obtain consistency when the data comes from a multivariate normal
89  distribution, covariance matrix is multiplied by a correction factor
90 6. Robust distances for all elements, using the final (T, S) are calculated
91  Then the very final mean and covariance estimates are calculated only for
92  values, whose robust distances are less than a cutoff value (0.975 quantile
93  of chi2 distribution with nvariables degrees of freedom)
94 */
95 
96 #include "TRobustEstimator.h"
97 #include "TMatrixDSymEigen.h"
98 #include "TRandom.h"
99 #include "TMath.h"
100 #include "TDecompChol.h"
101 
103 
104 const Double_t kChiMedian[50]= {
105  0.454937, 1.38629, 2.36597, 3.35670, 4.35146, 5.34812, 6.34581, 7.34412, 8.34283,
106  9.34182, 10.34, 11.34, 12.34, 13.34, 14.34, 15.34, 16.34, 17.34, 18.34, 19.34,
107  20.34, 21.34, 22.34, 23.34, 24.34, 25.34, 26.34, 27.34, 28.34, 29.34, 30.34,
108  31.34, 32.34, 33.34, 34.34, 35.34, 36.34, 37.34, 38.34, 39.34, 40.34,
109  41.34, 42.34, 43.34, 44.34, 45.34, 46.34, 47.34, 48.34, 49.33};
110 
111 const Double_t kChiQuant[50]={
112  5.02389, 7.3776,9.34840,11.1433,12.8325,
113  14.4494,16.0128,17.5346,19.0228,20.4831,21.920,23.337,
114  24.736,26.119,27.488,28.845,30.191,31.526,32.852,34.170,
115  35.479,36.781,38.076,39.364,40.646,41.923,43.194,44.461,
116  45.722,46.979,48.232,49.481,50.725,51.966,53.203,54.437,
117  55.668,56.896,58.120,59.342,60.561,61.777,62.990,64.201,
118  65.410,66.617,67.821,69.022,70.222,71.420};
119 
120 ////////////////////////////////////////////////////////////////////////////////
121 ///this constructor should be used in a univariate case:
122 ///first call this constructor, then - the EvaluateUni(..) function
123 
125 }
126 
127 ////////////////////////////////////////////////////////////////////////////////
128 ///constructor
129 
131  :fMean(nvariables),
132  fCovariance(nvariables),
133  fInvcovariance(nvariables),
134  fCorrelation(nvariables),
135  fRd(nvectors),
136  fSd(nvectors),
137  fOut(1),
138  fHyperplane(nvariables),
139  fData(nvectors, nvariables)
140 {
141  if ((nvectors<=1)||(nvariables<=0)){
142  Error("TRobustEstimator","Not enough vectors or variables");
143  return;
144  }
145  if (nvariables==1){
146  Error("TRobustEstimator","For the univariate case, use the default constructor and EvaluateUni() function");
147  return;
148  }
149 
150  fN=nvectors;
151  fNvar=nvariables;
152  if (hh<(fN+fNvar+1)/2){
153  if (hh>0)
154  Warning("TRobustEstimator","chosen h is too small, default h is taken instead");
155  fH=(fN+fNvar+1)/2;
156  } else
157  fH=hh;
158 
159  fVarTemp=0;
160  fVecTemp=0;
161  fExact=0;
162 }
163 
164 ////////////////////////////////////////////////////////////////////////////////
165 ///adds a column to the data matrix
166 ///it is assumed that the column has size fN
167 ///variable fVarTemp keeps the number of columns l
168 ///already added
169 
171 {
172  if (fVarTemp==fNvar) {
173  fNvar++;
180  }
181  for (Int_t i=0; i<fN; i++) {
182  fData(i, fVarTemp)=col[i];
183  }
184  fVarTemp++;
185 }
186 
187 ////////////////////////////////////////////////////////////////////////////////
188 ///adds a vector to the data matrix
189 ///it is supposed that the vector is of size fNvar
190 
192 {
193  if(fVecTemp==fN) {
194  fN++;
195  fRd.ResizeTo(fN);
196  fSd.ResizeTo(fN);
198  }
199  for (Int_t i=0; i<fNvar; i++)
200  fData(fVecTemp, i)=row[i];
201 
202  fVecTemp++;
203 }
204 
205 ////////////////////////////////////////////////////////////////////////////////
206 ///Finds the estimate of multivariate mean and variance
207 
209 {
210  Double_t kEps=1e-14;
211 
212  if (fH==fN){
213  Warning("Evaluate","Chosen h = #observations, so classic estimates of location and scatter will be calculated");
214  Classic();
215  return;
216  }
217 
218  Int_t i, j, k;
219  Int_t ii, jj;
220  Int_t nmini = 300;
221  Int_t k1=500;
222  Int_t nbest=10;
223  TMatrixD sscp(fNvar+1, fNvar+1);
224  TVectorD vec(fNvar);
225 
226  Int_t *index = new Int_t[fN];
227  Double_t *ndist = new Double_t[fN];
228  Double_t det;
229  Double_t *deti=new Double_t[nbest];
230  for (i=0; i<nbest; i++)
231  deti[i]=1e16;
232 
233  for (i=0; i<fN; i++)
234  fRd(i)=0;
235  ////////////////////////////
236  //for small n
237  ////////////////////////////
238  if (fN<nmini*2) {
239  //for storing the best fMeans and covariances
240 
241  TMatrixD mstock(nbest, fNvar);
242  TMatrixD cstock(fNvar, fNvar*nbest);
243 
244  for (k=0; k<k1; k++) {
245  CreateSubset(fN, fH, fNvar, index, fData, sscp, ndist);
246  //calculate the mean and covariance of the created subset
247  ClearSscp(sscp);
248  for (i=0; i<fH; i++) {
249  for(j=0; j<fNvar; j++)
250  vec(j)=fData[index[i]][j];
251  AddToSscp(sscp, vec);
252  }
253  Covar(sscp, fMean, fCovariance, fSd, fH);
254  det = fCovariance.Determinant();
255  if (det < kEps) {
256  fExact = Exact(ndist);
257  delete [] index;
258  delete [] ndist;
259  delete [] deti;
260  return;
261  }
262  //make 2 CSteps
263  det = CStep(fN, fH, index, fData, sscp, ndist);
264  if (det < kEps) {
265  fExact = Exact(ndist);
266  delete [] index;
267  delete [] ndist;
268  delete [] deti;
269  return;
270  }
271  det = CStep(fN, fH, index, fData, sscp, ndist);
272  if (det < kEps) {
273  fExact = Exact(ndist);
274  delete [] index;
275  delete [] ndist;
276  delete [] deti;
277  return;
278  } else {
279  Int_t maxind=TMath::LocMax(nbest, deti);
280  if(det<deti[maxind]) {
281  deti[maxind]=det;
282  for(ii=0; ii<fNvar; ii++) {
283  mstock(maxind, ii)=fMean(ii);
284  for(jj=0; jj<fNvar; jj++)
285  cstock(ii, jj+maxind*fNvar)=fCovariance(ii, jj);
286  }
287  }
288  }
289  }
290 
291  //now for nbest best results perform CSteps until convergence
292 
293  for (i=0; i<nbest; i++) {
294  for(ii=0; ii<fNvar; ii++) {
295  fMean(ii)=mstock(i, ii);
296  for (jj=0; jj<fNvar; jj++)
297  fCovariance(ii, jj)=cstock(ii, jj+i*fNvar);
298  }
299 
300  det=1;
301  while (det>kEps) {
302  det=CStep(fN, fH, index, fData, sscp, ndist);
303  if(TMath::Abs(det-deti[i])<kEps)
304  break;
305  else
306  deti[i]=det;
307  }
308  for(ii=0; ii<fNvar; ii++) {
309  mstock(i,ii)=fMean(ii);
310  for (jj=0; jj<fNvar; jj++)
311  cstock(ii,jj+i*fNvar)=fCovariance(ii, jj);
312  }
313  }
314 
315  Int_t detind=TMath::LocMin(nbest, deti);
316  for(ii=0; ii<fNvar; ii++) {
317  fMean(ii)=mstock(detind,ii);
318 
319  for(jj=0; jj<fNvar; jj++)
320  fCovariance(ii, jj)=cstock(ii,jj+detind*fNvar);
321  }
322 
323  if (deti[detind]!=0) {
324  //calculate robust distances and throw out the bad points
325  Int_t nout = RDist(sscp);
326  Double_t cutoff=kChiQuant[fNvar-1];
327 
328  fOut.Set(nout);
329 
330  j=0;
331  for (i=0; i<fN; i++) {
332  if(fRd(i)>cutoff) {
333  fOut[j]=i;
334  j++;
335  }
336  }
337 
338  } else {
339  fExact=Exact(ndist);
340  }
341  delete [] index;
342  delete [] ndist;
343  delete [] deti;
344  return;
345 
346  }
347  /////////////////////////////////////////////////
348  //if n>nmini, the dataset should be partitioned
349  //partitioning
350  ////////////////////////////////////////////////
351  Int_t indsubdat[5];
352  Int_t nsub;
353  for (ii=0; ii<5; ii++)
354  indsubdat[ii]=0;
355 
356  nsub = Partition(nmini, indsubdat);
357 
358  Int_t sum=0;
359  for (ii=0; ii<5; ii++)
360  sum+=indsubdat[ii];
361  Int_t *subdat=new Int_t[sum];
362  //printf("allocates subdat[ %d ]\n", sum);
363  // init the subdat matrix
364  for (int iii = 0; iii < sum; ++iii) subdat[iii] = -999;
365  RDraw(subdat, nsub, indsubdat);
366  for (int iii = 0; iii < sum; ++iii) {
367  if (subdat[iii] < 0 || subdat[iii] >= fN ) {
368  Error("Evaluate","subdat index is invalid subdat[%d] = %d",iii, subdat[iii] );
369  R__ASSERT(0);
370  }
371  }
372  //now the indexes of selected cases are in the array subdat
373  //matrices to store best means and covariances
374  Int_t nbestsub=nbest*nsub;
375  TMatrixD mstockbig(nbestsub, fNvar);
376  TMatrixD cstockbig(fNvar, fNvar*nbestsub);
377  TMatrixD hyperplane(nbestsub, fNvar);
378  for (i=0; i<nbestsub; i++) {
379  for(j=0; j<fNvar; j++)
380  hyperplane(i,j)=0;
381  }
382  Double_t *detibig = new Double_t[nbestsub];
383  Int_t maxind;
384  maxind=TMath::LocMax(5, indsubdat);
385  TMatrixD dattemp(indsubdat[maxind], fNvar);
386 
387  Int_t k2=Int_t(k1/nsub);
388  //construct h-subsets and perform 2 CSteps in subgroups
389 
390  for (Int_t kgroup=0; kgroup<nsub; kgroup++) {
391  //printf("group #%d\n", kgroup);
392  Int_t ntemp=indsubdat[kgroup];
393  Int_t temp=0;
394  for (i=0; i<kgroup; i++)
395  temp+=indsubdat[i];
396  Int_t par;
397 
398 
399  for(i=0; i<ntemp; i++) {
400  for (j=0; j<fNvar; j++) {
401  dattemp(i,j)=fData[subdat[temp+i]][j];
402  }
403  }
404  Int_t htemp=Int_t(fH*ntemp/fN);
405 
406  for (i=0; i<nbest; i++)
407  deti[i]=1e16;
408 
409  for(k=0; k<k2; k++) {
410  CreateSubset(ntemp, htemp, fNvar, index, dattemp, sscp, ndist);
411  ClearSscp(sscp);
412  for (i=0; i<htemp; i++) {
413  for(j=0; j<fNvar; j++) {
414  vec(j)=dattemp(index[i],j);
415  }
416  AddToSscp(sscp, vec);
417  }
418  Covar(sscp, fMean, fCovariance, fSd, htemp);
419  det = fCovariance.Determinant();
420  if (det<kEps) {
421  par =Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
422  if(par==nbest+1) {
423 
424  delete [] detibig;
425  delete [] deti;
426  delete [] subdat;
427  delete [] ndist;
428  delete [] index;
429  return;
430  } else
431  deti[par]=det;
432  } else {
433  det = CStep(ntemp, htemp, index, dattemp, sscp, ndist);
434  if (det<kEps) {
435  par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp, ndist);
436  if(par==nbest+1) {
437 
438  delete [] detibig;
439  delete [] deti;
440  delete [] subdat;
441  delete [] ndist;
442  delete [] index;
443  return;
444  } else
445  deti[par]=det;
446  } else {
447  det=CStep(ntemp,htemp, index, dattemp, sscp, ndist);
448  if(det<kEps){
449  par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
450  if(par==nbest+1) {
451 
452  delete [] detibig;
453  delete [] deti;
454  delete [] subdat;
455  delete [] ndist;
456  delete [] index;
457  return;
458  } else {
459  deti[par]=det;
460  }
461  } else {
462  maxind=TMath::LocMax(nbest, deti);
463  if(det<deti[maxind]) {
464  deti[maxind]=det;
465  for(i=0; i<fNvar; i++) {
466  mstockbig(nbest*kgroup+maxind,i)=fMean(i);
467  for(j=0; j<fNvar; j++) {
468  cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
469 
470  }
471  }
472  }
473 
474  }
475  }
476  }
477 
478  maxind=TMath::LocMax(nbest, deti);
479  if (deti[maxind]<kEps)
480  break;
481  }
482 
483 
484  for(i=0; i<nbest; i++) {
485  detibig[kgroup*nbest + i]=deti[i];
486 
487  }
488 
489  }
490 
491  //now the arrays mstockbig and cstockbig store nbest*nsub best means and covariances
492  //detibig stores nbest*nsub their determinants
493  //merge the subsets and carry out 2 CSteps on the merged set for all 50 best solutions
494 
495  TMatrixD datmerged(sum, fNvar);
496  for(i=0; i<sum; i++) {
497  for (j=0; j<fNvar; j++)
498  datmerged(i,j)=fData[subdat[i]][j];
499  }
500  // printf("performing calculations for merged set\n");
501  Int_t hmerged=Int_t(sum*fH/fN);
502 
503  Int_t nh;
504  for(k=0; k<nbestsub; k++) {
505  //for all best solutions perform 2 CSteps and then choose the very best
506  for(ii=0; ii<fNvar; ii++) {
507  fMean(ii)=mstockbig(k,ii);
508  for(jj=0; jj<fNvar; jj++)
509  fCovariance(ii, jj)=cstockbig(ii,k*fNvar+jj);
510  }
511  if(detibig[k]==0) {
512  for(i=0; i<fNvar; i++)
513  fHyperplane(i)=hyperplane(k,i);
514  CreateOrtSubset(datmerged,index, hmerged, sum, sscp, ndist);
515 
516  }
517  det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
518  if (det<kEps) {
519  nh= Exact(ndist);
520  if (nh>=fH) {
521  fExact = nh;
522 
523  delete [] detibig;
524  delete [] deti;
525  delete [] subdat;
526  delete [] ndist;
527  delete [] index;
528  return;
529  } else {
530  CreateOrtSubset(datmerged, index, hmerged, sum, sscp, ndist);
531  }
532  }
533 
534  det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
535  if (det<kEps) {
536  nh=Exact(ndist);
537  if (nh>=fH) {
538  fExact = nh;
539  delete [] detibig;
540  delete [] deti;
541  delete [] subdat;
542  delete [] ndist;
543  delete [] index;
544  return;
545  }
546  }
547  detibig[k]=det;
548  for(i=0; i<fNvar; i++) {
549  mstockbig(k,i)=fMean(i);
550  for(j=0; j<fNvar; j++) {
551  cstockbig(i,k*fNvar+j)=fCovariance(i, j);
552  }
553  }
554  }
555  //now for the subset with the smallest determinant
556  //repeat CSteps until convergence
557  Int_t minind=TMath::LocMin(nbestsub, detibig);
558  det=detibig[minind];
559  for(i=0; i<fNvar; i++) {
560  fMean(i)=mstockbig(minind,i);
561  fHyperplane(i)=hyperplane(minind,i);
562  for(j=0; j<fNvar; j++)
563  fCovariance(i, j)=cstockbig(i,minind*fNvar + j);
564  }
565  if(det<kEps)
566  CreateOrtSubset(fData, index, fH, fN, sscp, ndist);
567  det=1;
568  while (det>kEps) {
569  det=CStep(fN, fH, index, fData, sscp, ndist);
570  if(TMath::Abs(det-detibig[minind])<kEps) {
571  break;
572  } else {
573  detibig[minind]=det;
574  }
575  }
576  if(det<kEps) {
577  Exact(ndist);
578  fExact=kTRUE;
579  }
580  Int_t nout = RDist(sscp);
581  Double_t cutoff=kChiQuant[fNvar-1];
582 
583  fOut.Set(nout);
584 
585  j=0;
586  for (i=0; i<fN; i++) {
587  if(fRd(i)>cutoff) {
588  fOut[j]=i;
589  j++;
590  }
591  }
592 
593  delete [] detibig;
594  delete [] deti;
595  delete [] subdat;
596  delete [] ndist;
597  delete [] index;
598  return;
599 }
600 
601 ////////////////////////////////////////////////////////////////////////////////
602 ///for the univariate case
603 ///estimates of location and scatter are returned in mean and sigma parameters
604 ///the algorithm works on the same principle as in multivariate case -
605 ///it finds a subset of size hh with smallest sigma, and then returns mean and
606 ///sigma of this subset
607 
609 {
610  if (hh==0)
611  hh=(nvectors+2)/2;
612  Double_t faclts[]={2.6477,2.5092,2.3826,2.2662,2.1587,2.0589,1.9660,1.879,1.7973,1.7203,1.6473};
613  Int_t *index=new Int_t[nvectors];
614  TMath::Sort(nvectors, data, index, kFALSE);
615 
616  Int_t nquant;
617  nquant=TMath::Min(Int_t(Double_t(((hh*1./nvectors)-0.5)*40))+1, 11);
618  Double_t factor=faclts[nquant-1];
619 
620  Double_t *aw=new Double_t[nvectors];
621  Double_t *aw2=new Double_t[nvectors];
622  Double_t sq=0;
623  Double_t sqmin=0;
624  Int_t ndup=0;
625  Int_t len=nvectors-hh;
626  Double_t *slutn=new Double_t[len];
627  for(Int_t i=0; i<len; i++)
628  slutn[i]=0;
629  for(Int_t jint=0; jint<len; jint++) {
630  aw[jint]=0;
631  for (Int_t j=0; j<hh; j++) {
632  aw[jint]+=data[index[j+jint]];
633  if(jint==0)
634  sq+=data[index[j]]*data[index[j]];
635  }
636  aw2[jint]=aw[jint]*aw[jint]/hh;
637 
638  if(jint==0) {
639  sq=sq-aw2[jint];
640  sqmin=sq;
641  slutn[ndup]=aw[jint];
642 
643  } else {
644  sq=sq - data[index[jint-1]]*data[index[jint-1]]+
645  data[index[jint+hh]]*data[index[jint+hh]]-
646  aw2[jint]+aw2[jint-1];
647  if(sq<sqmin) {
648  ndup=0;
649  sqmin=sq;
650  slutn[ndup]=aw[jint];
651 
652  } else {
653  if(sq==sqmin) {
654  ndup++;
655  slutn[ndup]=aw[jint];
656  }
657  }
658  }
659  }
660 
661  slutn[0]=slutn[Int_t((ndup)/2)]/hh;
662  Double_t bstd=factor*TMath::Sqrt(sqmin/hh);
663  mean=slutn[0];
664  sigma=bstd;
665  delete [] aw;
666  delete [] aw2;
667  delete [] slutn;
668  delete [] index;
669 }
670 
671 ////////////////////////////////////////////////////////////////////////////////
672 ///returns the breakdown point of the algorithm
673 
675 {
676  Int_t n;
677  n=(fN-fH+1)/fN;
678  return n;
679 }
680 
681 ////////////////////////////////////////////////////////////////////////////////
682 ///returns the chi2 quantiles
683 
685 {
686  if (i < 0 || i >= 50) return 0;
687  return kChiQuant[i];
688 }
689 
690 ////////////////////////////////////////////////////////////////////////////////
691 ///returns the covariance matrix
692 
694 {
695  if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar){
696  Warning("GetCovariance","provided matrix is of the wrong size, it will be resized");
697  matr.ResizeTo(fNvar, fNvar);
698  }
699  matr=fCovariance;
700 }
701 
702 ////////////////////////////////////////////////////////////////////////////////
703 ///returns the correlation matrix
704 
706 {
707  if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar) {
708  Warning("GetCorrelation","provided matrix is of the wrong size, it will be resized");
709  matr.ResizeTo(fNvar, fNvar);
710  }
711  matr=fCorrelation;
712 }
713 
714 ////////////////////////////////////////////////////////////////////////////////
715 ///if the points are on a hyperplane, returns this hyperplane
716 
718 {
719  if (fExact==0) {
720  Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
721  return 0;
722  } else {
723  return &fHyperplane;
724  }
725 }
726 
727 ////////////////////////////////////////////////////////////////////////////////
728 ///if the points are on a hyperplane, returns this hyperplane
729 
731 {
732  if (fExact==0){
733  Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
734  return;
735  }
736  if (vec.GetNoElements()!=fNvar) {
737  Warning("GetHyperPlane","provided vector is of the wrong size, it will be resized");
738  vec.ResizeTo(fNvar);
739  }
740  vec=fHyperplane;
741 }
742 
743 ////////////////////////////////////////////////////////////////////////////////
744 ///return the estimate of the mean
745 
747 {
748  if (means.GetNoElements()!=fNvar) {
749  Warning("GetMean","provided vector is of the wrong size, it will be resized");
750  means.ResizeTo(fNvar);
751  }
752  means=fMean;
753 }
754 
755 ////////////////////////////////////////////////////////////////////////////////
756 ///returns the robust distances (helps to find outliers)
757 
759 {
760  if (rdist.GetNoElements()!=fN) {
761  Warning("GetRDistances","provided vector is of the wrong size, it will be resized");
762  rdist.ResizeTo(fN);
763  }
764  rdist=fRd;
765 }
766 
767 ////////////////////////////////////////////////////////////////////////////////
768 ///returns the number of outliers
769 
771 {
772  return fOut.GetSize();
773 }
774 
775 ////////////////////////////////////////////////////////////////////////////////
776 ///update the sscp matrix with vector vec
777 
779 {
780  Int_t i, j;
781  for (j=1; j<fNvar+1; j++) {
782  sscp(0, j) +=vec(j-1);
783  sscp(j, 0) = sscp(0, j);
784  }
785  for (i=1; i<fNvar+1; i++) {
786  for (j=1; j<fNvar+1; j++) {
787  sscp(i, j) += vec(i-1)*vec(j-1);
788  }
789  }
790 }
791 
792 ////////////////////////////////////////////////////////////////////////////////
793 ///clear the sscp matrix, used for covariance and mean calculation
794 
796 {
797  for (Int_t i=0; i<fNvar+1; i++) {
798  for (Int_t j=0; j<fNvar+1; j++) {
799  sscp(i, j)=0;
800  }
801  }
802 }
803 
804 ////////////////////////////////////////////////////////////////////////////////
805 ///called when h=n. Returns classic covariance matrix
806 ///and mean
807 
809 {
810  TMatrixD sscp(fNvar+1, fNvar+1);
811  TVectorD temp(fNvar);
812  ClearSscp(sscp);
813  for (Int_t i=0; i<fN; i++) {
814  for (Int_t j=0; j<fNvar; j++)
815  temp(j)=fData(i, j);
816  AddToSscp(sscp, temp);
817  }
818  Covar(sscp, fMean, fCovariance, fSd, fN);
819  Correl();
820 
821 }
822 
823 ////////////////////////////////////////////////////////////////////////////////
824 ///calculates mean and covariance
825 
827 {
828  Int_t i, j;
829  Double_t f;
830  for (i=0; i<fNvar; i++) {
831  m(i)=sscp(0, i+1);
832  sd[i]=sscp(i+1, i+1);
833  f=(sd[i]-m(i)*m(i)/nvec)/(nvec-1);
834  if (f>1e-14) sd[i]=TMath::Sqrt(f);
835  else sd[i]=0;
836  m(i)/=nvec;
837  }
838  for (i=0; i<fNvar; i++) {
839  for (j=0; j<fNvar; j++) {
840  cov(i, j)=sscp(i+1, j+1)-nvec*m(i)*m(j);
841  cov(i, j)/=nvec-1;
842  }
843  }
844 }
845 
846 ////////////////////////////////////////////////////////////////////////////////
847 ///transforms covariance matrix into correlation matrix
848 
850 {
851  Int_t i, j;
852  Double_t *sd=new Double_t[fNvar];
853  for(j=0; j<fNvar; j++)
854  sd[j]=1./TMath::Sqrt(fCovariance(j, j));
855  for(i=0; i<fNvar; i++) {
856  for (j=0; j<fNvar; j++) {
857  if (i==j)
858  fCorrelation(i, j)=1.;
859  else
860  fCorrelation(i, j)=fCovariance(i, j)*sd[i]*sd[j];
861  }
862  }
863  delete [] sd;
864 }
865 
866 ////////////////////////////////////////////////////////////////////////////////
867 ///creates a subset of htotal elements from ntotal elements
868 ///first, p+1 elements are drawn randomly(without repetitions)
869 ///if their covariance matrix is singular, more elements are
870 ///added one by one, until their covariance matrix becomes regular
871 ///or it becomes clear that htotal observations lie on a hyperplane
872 ///If covariance matrix determinant!=0, distances of all ntotal elements
873 ///are calculated, using formula d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where
874 ///M is mean and S_inv is the inverse of the covariance matrix
875 ///htotal points with smallest distances are included in the returned subset.
876 
877 void TRobustEstimator::CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
878 {
879  Double_t kEps = 1e-14;
880  Int_t i, j;
881  Bool_t repeat=kFALSE;
882  Int_t nindex=0;
883  Int_t num;
884  for(i=0; i<ntotal; i++)
885  index[i]=ntotal+1;
886 
887  for (i=0; i<p+1; i++) {
888  num=Int_t(gRandom->Uniform(0, 1)*(ntotal-1));
889  if (i>0){
890  for(j=0; j<=i-1; j++) {
891  if(index[j]==num)
892  repeat=kTRUE;
893  }
894  }
895  if(repeat==kTRUE) {
896  i--;
897  repeat=kFALSE;
898  } else {
899  index[i]=num;
900  nindex++;
901  }
902  }
903 
904  ClearSscp(sscp);
905 
906  TVectorD vec(fNvar);
907  Double_t det;
908  for (i=0; i<p+1; i++) {
909  for (j=0; j<fNvar; j++) {
910  vec[j]=data[index[i]][j];
911 
912  }
913  AddToSscp(sscp, vec);
914  }
915 
916  Covar(sscp, fMean, fCovariance, fSd, p+1);
917  det=fCovariance.Determinant();
918  while((det<kEps)&&(nindex < htotal)) {
919  //if covariance matrix is singular,another vector is added until
920  //the matrix becomes regular or it becomes clear that all
921  //vectors of the group lie on a hyperplane
922  repeat=kFALSE;
923  do{
924  num=Int_t(gRandom->Uniform(0,1)*(ntotal-1));
925  repeat=kFALSE;
926  for(i=0; i<nindex; i++) {
927  if(index[i]==num) {
928  repeat=kTRUE;
929  break;
930  }
931  }
932  }while(repeat==kTRUE);
933 
934  index[nindex]=num;
935  nindex++;
936  //check if covariance matrix is singular
937  for(j=0; j<fNvar; j++)
938  vec[j]=data[index[nindex-1]][j];
939  AddToSscp(sscp, vec);
940  Covar(sscp, fMean, fCovariance, fSd, nindex);
941  det=fCovariance.Determinant();
942  }
943 
944  if(nindex!=htotal) {
945  TDecompChol chol(fCovariance);
946  fInvcovariance = chol.Invert();
947 
948  TVectorD temp(fNvar);
949  for(j=0; j<ntotal; j++) {
950  ndist[j]=0;
951  for(i=0; i<fNvar; i++)
952  temp[i]=data[j][i] - fMean(i);
953  temp*=fInvcovariance;
954  for(i=0; i<fNvar; i++)
955  ndist[j]+=(data[j][i]-fMean(i))*temp[i];
956  }
957  KOrdStat(ntotal, ndist, htotal-1,index);
958  }
959 
960 }
961 
962 ////////////////////////////////////////////////////////////////////////////////
963 ///creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane
964 ///hyp[1]*(x1-mean[1])+...+hyp[nvar]*(xnvar-mean[nvar])=0
965 ///This function is called in case when less than fH samples lie on a hyperplane.
966 
967 void TRobustEstimator::CreateOrtSubset(TMatrixD &dat,Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
968 {
969  Int_t i, j;
970  TVectorD vec(fNvar);
971  for (i=0; i<nmerged; i++) {
972  ndist[i]=0;
973  for(j=0; j<fNvar; j++) {
974  ndist[i]+=fHyperplane[j]*(dat[i][j]-fMean[j]);
975  ndist[i]=TMath::Abs(ndist[i]);
976  }
977  }
978  KOrdStat(nmerged, ndist, hmerged-1, index);
979  ClearSscp(sscp);
980  for (i=0; i<hmerged; i++) {
981  for(j=0; j<fNvar; j++)
982  vec[j]=dat[index[i]][j];
983  AddToSscp(sscp, vec);
984  }
985  Covar(sscp, fMean, fCovariance, fSd, hmerged);
986 }
987 
988 ////////////////////////////////////////////////////////////////////////////////
989 ///from the input htotal-subset constructs another htotal subset with lower determinant
990 ///
991 ///As proven by Peter J.Rousseeuw and Katrien Van Driessen, if distances for all elements
992 ///are calculated, using the formula:d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where M is the mean
993 ///of the input htotal-subset, and S_inv - the inverse of its covariance matrix, then
994 ///htotal elements with smallest distances will have covariance matrix with determinant
995 ///less or equal to the determinant of the input subset covariance matrix.
996 ///
997 ///determinant for this htotal-subset with smallest distances is returned
998 
1000 {
1001  Int_t i, j;
1002  TVectorD vec(fNvar);
1003  Double_t det;
1004 
1005  TDecompChol chol(fCovariance);
1006  fInvcovariance = chol.Invert();
1007 
1008  TVectorD temp(fNvar);
1009  for(j=0; j<ntotal; j++) {
1010  ndist[j]=0;
1011  for(i=0; i<fNvar; i++)
1012  temp[i]=data[j][i]-fMean[i];
1013  temp*=fInvcovariance;
1014  for(i=0; i<fNvar; i++)
1015  ndist[j]+=(data[j][i]-fMean[i])*temp[i];
1016  }
1017 
1018  //taking h smallest
1019  KOrdStat(ntotal, ndist, htotal-1, index);
1020  //writing their mean and covariance
1021  ClearSscp(sscp);
1022  for (i=0; i<htotal; i++) {
1023  for (j=0; j<fNvar; j++)
1024  temp[j]=data[index[i]][j];
1025  AddToSscp(sscp, temp);
1026  }
1027  Covar(sscp, fMean, fCovariance, fSd, htotal);
1028  det = fCovariance.Determinant();
1029  return det;
1030 }
1031 
1032 ////////////////////////////////////////////////////////////////////////////////
1033 ///for the exact fit situations
1034 ///returns number of observations on the hyperplane
1035 
1037 {
1038  Int_t i, j;
1039 
1041  TVectorD eigenValues=eigen.GetEigenValues();
1042  TMatrixD eigenMatrix=eigen.GetEigenVectors();
1043 
1044  for (j=0; j<fNvar; j++) {
1045  fHyperplane[j]=eigenMatrix(j,fNvar-1);
1046  }
1047  //calculate and return how many observations lie on the hyperplane
1048  for (i=0; i<fN; i++) {
1049  ndist[i]=0;
1050  for(j=0; j<fNvar; j++) {
1051  ndist[i]+=fHyperplane[j]*(fData[i][j]-fMean[j]);
1052  ndist[i]=TMath::Abs(ndist[i]);
1053  }
1054  }
1055  Int_t nhyp=0;
1056 
1057  for (i=0; i<fN; i++) {
1058  if(ndist[i] < 1e-14) nhyp++;
1059  }
1060  return nhyp;
1061 
1062 }
1063 
1064 ////////////////////////////////////////////////////////////////////////////////
1065 ///This function is called if determinant of the covariance matrix of a subset=0.
1066 ///
1067 ///If there are more then fH vectors on a hyperplane,
1068 ///returns this hyperplane and stops
1069 ///else stores the hyperplane coordinates in hyperplane matrix
1070 
1071 Int_t TRobustEstimator::Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane,
1072  Double_t *deti, Int_t nbest, Int_t kgroup,
1073  TMatrixD &sscp, Double_t *ndist)
1074 {
1075  Int_t i, j;
1076 
1077  TVectorD vec(fNvar);
1078  Int_t maxind = TMath::LocMax(nbest, deti);
1079  Int_t nh=Exact(ndist);
1080  //now nh is the number of observation on the hyperplane
1081  //ndist stores distances of observation from this hyperplane
1082  if(nh>=fH) {
1083  ClearSscp(sscp);
1084  for (i=0; i<fN; i++) {
1085  if(ndist[i]<1e-14) {
1086  for (j=0; j<fNvar; j++)
1087  vec[j]=fData[i][j];
1088  AddToSscp(sscp, vec);
1089  }
1090  }
1091  Covar(sscp, fMean, fCovariance, fSd, nh);
1092 
1093  fExact=nh;
1094  return nbest+1;
1095 
1096  } else {
1097  //if less than fH observations lie on a hyperplane,
1098  //mean and covariance matrix are stored in mstockbig
1099  //and cstockbig in place of the previous maximum determinant
1100  //mean and covariance
1101  for(i=0; i<fNvar; i++) {
1102  mstockbig(nbest*kgroup+maxind,i)=fMean(i);
1103  hyperplane(nbest*kgroup+maxind,i)=fHyperplane(i);
1104  for(j=0; j<fNvar; j++) {
1105  cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
1106  }
1107  }
1108  return maxind;
1109  }
1110 }
1111 
1112 
1113 ////////////////////////////////////////////////////////////////////////////////
1114 ///divides the elements into approximately equal subgroups
1115 ///number of elements in each subgroup is stored in indsubdat
1116 ///number of subgroups is returned
1117 
1119 {
1120  Int_t nsub;
1121  if ((fN>=2*nmini) && (fN<=(3*nmini-1))) {
1122  if (fN%2==1){
1123  indsubdat[0]=Int_t(fN*0.5);
1124  indsubdat[1]=Int_t(fN*0.5)+1;
1125  } else
1126  indsubdat[0]=indsubdat[1]=Int_t(fN/2);
1127  nsub=2;
1128  }
1129  else{
1130  if((fN>=3*nmini) && (fN<(4*nmini -1))) {
1131  if(fN%3==0){
1132  indsubdat[0]=indsubdat[1]=indsubdat[2]=Int_t(fN/3);
1133  } else {
1134  indsubdat[0]=Int_t(fN/3);
1135  indsubdat[1]=Int_t(fN/3)+1;
1136  if (fN%3==1) indsubdat[2]=Int_t(fN/3);
1137  else indsubdat[2]=Int_t(fN/3)+1;
1138  }
1139  nsub=3;
1140  }
1141  else{
1142  if((fN>=4*nmini)&&(fN<=(5*nmini-1))){
1143  if (fN%4==0) indsubdat[0]=indsubdat[1]=indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1144  else {
1145  indsubdat[0]=Int_t(fN/4);
1146  indsubdat[1]=Int_t(fN/4)+1;
1147  if(fN%4==1) indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1148  if(fN%4==2) {
1149  indsubdat[2]=Int_t(fN/4)+1;
1150  indsubdat[3]=Int_t(fN/4);
1151  }
1152  if(fN%4==3) indsubdat[2]=indsubdat[3]=Int_t(fN/4)+1;
1153  }
1154  nsub=4;
1155  } else {
1156  for(Int_t i=0; i<5; i++)
1157  indsubdat[i]=nmini;
1158  nsub=5;
1159  }
1160  }
1161  }
1162  return nsub;
1163 }
1164 
1165 ////////////////////////////////////////////////////////////////////////////////
1166 ///Calculates robust distances.Then the samples with robust distances
1167 ///greater than a cutoff value (0.975 quantile of chi2 distribution with
1168 ///fNvar degrees of freedom, multiplied by a correction factor), are given
1169 ///weiht=0, and new, reweighted estimates of location and scatter are calculated
1170 ///The function returns the number of outliers.
1171 
1173 {
1174  Int_t i, j;
1175  Int_t nout=0;
1176 
1177  TVectorD temp(fNvar);
1178  TDecompChol chol(fCovariance);
1179  fInvcovariance = chol.Invert();
1180 
1181 
1182  for (i=0; i<fN; i++) {
1183  fRd[i]=0;
1184  for(j=0; j<fNvar; j++) {
1185  temp[j]=fData[i][j]-fMean[j];
1186  }
1187  temp*=fInvcovariance;
1188  for(j=0; j<fNvar; j++) {
1189  fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1190  }
1191  }
1192 
1193  Double_t med;
1194  Double_t chi = kChiMedian[fNvar-1];
1195 
1196  med=TMath::Median(fN, fRd.GetMatrixArray());
1197  med/=chi;
1198  fCovariance*=med;
1199  TDecompChol chol2(fCovariance);
1200  fInvcovariance = chol2.Invert();
1201 
1202  for (i=0; i<fN; i++) {
1203  fRd[i]=0;
1204  for(j=0; j<fNvar; j++) {
1205  temp[j]=fData[i][j]-fMean[j];
1206  }
1207 
1208  temp*=fInvcovariance;
1209  for(j=0; j<fNvar; j++) {
1210  fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1211  }
1212  }
1213 
1214  Double_t cutoff = kChiQuant[fNvar-1];
1215 
1216  ClearSscp(sscp);
1217  for(i=0; i<fN; i++) {
1218  if (fRd[i]<=cutoff) {
1219  for(j=0; j<fNvar; j++)
1220  temp[j]=fData[i][j];
1221  AddToSscp(sscp,temp);
1222  } else {
1223  nout++;
1224  }
1225  }
1226 
1227  Covar(sscp, fMean, fCovariance, fSd, fN-nout);
1228  return nout;
1229 }
1230 
1231 ////////////////////////////////////////////////////////////////////////////////
1232 ///Draws ngroup nonoverlapping subdatasets out of a dataset of size n
1233 ///such that the selected case numbers are uniformly distributed from 1 to n
1234 
1235 void TRobustEstimator::RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
1236 {
1237  Int_t jndex = 0;
1238  Int_t nrand;
1239  Int_t i, k, m, j;
1240  for (k=1; k<=ngroup; k++) {
1241  for (m=1; m<=indsubdat[k-1]; m++) {
1242  nrand = Int_t(gRandom->Uniform(0, 1) * double(fN-jndex))+1;
1243  //printf("nrand = %d - jndex %d\n",nrand,jndex);
1244  jndex++;
1245  if (jndex==1) {
1246  subdat[0]=nrand-1; // in case nrand is equal to fN
1247  } else {
1248  subdat[jndex-1]=nrand+jndex-2;
1249  for (i=1; i<=jndex-1; i++) {
1250  if(subdat[i-1] > nrand+i-2) {
1251  for(j=jndex; j>=i+1; j--) {
1252  subdat[j-1]=subdat[j-2];
1253  }
1254  //printf("subdata[] i = %d - nrand %d\n",i,nrand);
1255  subdat[i-1]=nrand+i-2;
1256  break; //breaking the loop for(i=1...
1257  }
1258  }
1259  }
1260  }
1261  }
1262 }
1263 
1264 ////////////////////////////////////////////////////////////////////////////////
1265 ///because I need an Int_t work array
1266 
1268  Bool_t isAllocated = kFALSE;
1269  const Int_t kWorkMax=100;
1270  Int_t i, ir, j, l, mid;
1271  Int_t arr;
1272  Int_t *ind;
1273  Int_t workLocal[kWorkMax];
1274  Int_t temp;
1275 
1276 
1277  if (work) {
1278  ind = work;
1279  } else {
1280  ind = workLocal;
1281  if (ntotal > kWorkMax) {
1282  isAllocated = kTRUE;
1283  ind = new Int_t[ntotal];
1284  }
1285  }
1286 
1287  for (Int_t ii=0; ii<ntotal; ii++) {
1288  ind[ii]=ii;
1289  }
1290  Int_t rk = k;
1291  l=0;
1292  ir = ntotal-1;
1293  for(;;) {
1294  if (ir<=l+1) { //active partition contains 1 or 2 elements
1295  if (ir == l+1 && a[ind[ir]]<a[ind[l]])
1296  {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1297  Double_t tmp = a[ind[rk]];
1298  if (isAllocated)
1299  delete [] ind;
1300  return tmp;
1301  } else {
1302  mid = (l+ir) >> 1; //choose median of left, center and right
1303  {temp = ind[mid]; ind[mid]=ind[l+1]; ind[l+1]=temp;}//elements as partitioning element arr.
1304  if (a[ind[l]]>a[ind[ir]]) //also rearrange so that a[l]<=a[l+1]
1305  {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1306 
1307  if (a[ind[l+1]]>a[ind[ir]])
1308  {temp=ind[l+1]; ind[l+1]=ind[ir]; ind[ir]=temp;}
1309 
1310  if (a[ind[l]]>a[ind[l+1]])
1311  {temp = ind[l]; ind[l]=ind[l+1]; ind[l+1]=temp;}
1312 
1313  i=l+1; //initialize pointers for partitioning
1314  j=ir;
1315  arr = ind[l+1];
1316  for (;;) {
1317  do i++; while (a[ind[i]]<a[arr]);
1318  do j--; while (a[ind[j]]>a[arr]);
1319  if (j<i) break; //pointers crossed, partitioning complete
1320  {temp=ind[i]; ind[i]=ind[j]; ind[j]=temp;}
1321  }
1322  ind[l+1]=ind[j];
1323  ind[j]=arr;
1324  if (j>=rk) ir = j-1; //keep active the partition that
1325  if (j<=rk) l=i; //contains the k_th element
1326  }
1327  }
1328 }
void Classic()
called when h=n.
Int_t GetNOut()
returns the number of outliers
static long int sum(long int i)
Definition: Factory.cxx:2173
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition: TVectorT.cxx:292
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
void CreateOrtSubset(TMatrixD &dat, Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane hyp[1]*(x1-m...
auto * m
Definition: textangle.C:8
you should not use this method at all Int_t Int_t Double_t Double_t Double_t Int_t mid
Definition: TRolke.cxx:630
Long64_t LocMax(Long64_t n, const T *a)
Definition: TMath.h:872
void CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
creates a subset of htotal elements from ntotal elements first, p+1 elements are drawn randomly(witho...
const Double_t kChiMedian[50]
void AddColumn(Double_t *col)
adds a column to the data matrix it is assumed that the column has size fN variable fVarTemp keeps th...
void Evaluate()
Finds the estimate of multivariate mean and variance.
Int_t GetNcols() const
Definition: TMatrixTBase.h:125
Int_t Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane, Double_t *deti, Int_t nbest, Int_t kgroup, TMatrixD &sscp, Double_t *ndist)
This function is called if determinant of the covariance matrix of a subset=0.
void ClearSscp(TMatrixD &sscp)
clear the sscp matrix, used for covariance and mean calculation
Double_t CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
from the input htotal-subset constructs another htotal subset with lower determinant ...
#define R__ASSERT(e)
Definition: TError.h:96
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:168
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
Double_t KOrdStat(Int_t ntotal, Double_t *arr, Int_t k, Int_t *work)
because I need an Int_t work array
void AddRow(Double_t *row)
adds a vector to the data matrix it is supposed that the vector is of size fNvar
const TMatrixDSym * GetCovariance() const
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
Definition: TMatrixT.cxx:1210
Short_t Abs(Short_t d)
Definition: TMathBase.h:108
TMatrixDSymEigen.
const TVectorD * GetMean() const
TMatrixDSym fInvcovariance
Minimum Covariance Determinant Estimator - a Fast Algorithm invented by Peter J.Rousseeuw and Katrien...
Double_t GetChiQuant(Int_t i) const
returns the chi2 quantiles
const Double_t kChiQuant[50]
void Sort(Index n, const Element *a, Index *index, Bool_t down=kTRUE)
Definition: TMath.h:1150
Bool_t Invert(TMatrixDSym &inv)
For a symmetric matrix A(m,m), its inverse A_inv(m,m) is returned .
const Double_t sigma
void Set(Int_t n)
Set size of this array to n ints.
Definition: TArrayI.cxx:105
Element * GetMatrixArray()
Definition: TVectorT.h:78
Cholesky Decomposition class.
Definition: TDecompChol.h:24
virtual Double_t Determinant() const
Int_t GetSize() const
Definition: TArray.h:47
auto * a
Definition: textangle.C:12
void RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
Draws ngroup nonoverlapping subdatasets out of a dataset of size n such that the selected case number...
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition: TObject.cxx:880
void AddToSscp(TMatrixD &sscp, TVectorD &vec)
update the sscp matrix with vector vec
const TVectorD * GetRDistances() const
Int_t GetNoElements() const
Definition: TVectorT.h:76
Int_t Exact(Double_t *ndist)
for the exact fit situations returns number of observations on the hyperplane
Int_t Partition(Int_t nmini, Int_t *indsubdat)
divides the elements into approximately equal subgroups number of elements in each subgroup is stored...
TMatrixDSym fCovariance
R__EXTERN TRandom * gRandom
Definition: TRandom.h:62
const TMatrixD & GetEigenVectors() const
Int_t GetNrows() const
Definition: TMatrixTBase.h:122
const Bool_t kFALSE
Definition: RtypesCore.h:88
const TVectorD * GetHyperplane() const
if the points are on a hyperplane, returns this hyperplane
const TVectorD & GetEigenValues() const
#define ClassImp(name)
Definition: Rtypes.h:359
const TMatrixDSym * GetCorrelation() const
double Double_t
Definition: RtypesCore.h:55
Double_t Median(Long64_t n, const T *a, const Double_t *w=0, Long64_t *work=0)
Definition: TMath.h:1225
Int_t RDist(TMatrixD &sscp)
Calculates robust distances.Then the samples with robust distances greater than a cutoff value (0...
void EvaluateUni(Int_t nvectors, Double_t *data, Double_t &mean, Double_t &sigma, Int_t hh=0)
for the univariate case estimates of location and scatter are returned in mean and sigma parameters t...
you should not use this method at all Int_t Int_t Double_t Double_t Double_t e
Definition: TRolke.cxx:630
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition: TRandom.cxx:627
Int_t GetBDPoint()
returns the breakdown point of the algorithm
void Correl()
transforms covariance matrix into correlation matrix
TMatrixDSym fCorrelation
auto * l
Definition: textangle.C:4
void Covar(TMatrixD &sscp, TVectorD &m, TMatrixDSym &cov, TVectorD &sd, Int_t nvec)
calculates mean and covariance
TRobustEstimator()
this constructor should be used in a univariate case: first call this constructor, then - the EvaluateUni(..) function
Double_t Sqrt(Double_t x)
Definition: TMath.h:590
Long64_t LocMin(Long64_t n, const T *a)
Definition: TMath.h:843
const Bool_t kTRUE
Definition: RtypesCore.h:87
const Int_t n
Definition: legend1.C:16
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition: TObject.cxx:866