Logo ROOT   6.08/07
Reference Guide
TRobustEstimator.cxx
Go to the documentation of this file.
1 // @(#)root/physics:$Id$
2 // Author: Anna Kreshuk 08/10/2004
3 
4 /*************************************************************************
5  * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6  * All rights reserved. *
7  * *
8  * For the licensing terms see $ROOTSYS/LICENSE. *
9  * For the list of contributors see $ROOTSYS/README/CREDITS. *
10  *************************************************************************/
11 
12 /** \class TRobustEstimator
13  \ingroup Physics
14 Minimum Covariance Determinant Estimator - a Fast Algorithm
15 invented by Peter J.Rousseeuw and Katrien Van Dreissen
16 "A Fast Algorithm for the Minimum covariance Determinant Estimator"
17 Technometrics, August 1999, Vol.41, NO.3
18 
19 What are robust estimators?
20 "An important property of an estimator is its robustness. An estimator
21 is called robust if it is insensitive to measurements that deviate
22 from the expected behaviour. There are 2 ways to treat such deviating
23 measurements: one may either try to recognise them and then remove
24 them from the data sample; or one may leave them in the sample, taking
25 care that they do not influence the estimate unduly. In both cases robust
26 estimators are needed...Robust procedures compensate for systematic errors
27 as much as possible, and indicate any situation in which a danger of not being
28 able to operate reliably is detected."
29 R.Fruhwirth, M.Regler, R.K.Bock, H.Grote, D.Notz
30 "Data Analysis Techniques for High-Energy Physics", 2nd edition
31 
32 What does this algorithm do?
33 It computes a highly robust estimator of multivariate location and scatter.
34 Then, it takes those estimates to compute robust distances of all the
35 data vectors. Those with large robust distances are considered outliers.
36 Robust distances can then be plotted for better visualization of the data.
37 
38 How does this algorithm do it?
39 The MCD objective is to find h observations(out of n) whose classical
40 covariance matrix has the lowest determinant. The MCD estimator of location
41 is then the average of those h points and the MCD estimate of scatter
42 is their covariance matrix. The minimum(and default) h = (n+nvariables+1)/2
43 so the algorithm is effective when less than (n+nvar+1)/2 variables are outliers.
44 The algorithm also allows for exact fit situations - that is, when h or more
45 observations lie on a hyperplane. Then the algorithm still yields the MCD location T
46 and scatter matrix S, the latter being singular as it should be. From (T,S) the
47 program then computes the equation of the hyperplane.
48 
49 How can this algorithm be used?
50 In any case, when contamination of data is suspected, that might influence
51 the classical estimates.
52 Also, robust estimation of location and scatter is a tool to robustify
53 other multivariate techniques such as, for example, principal-component analysis
54 and discriminant analysis.
55 
56 Technical details of the algorithm:
57 
58 1. The default h = (n+nvariables+1)/2, but the user may choose any integer h with
59  (n+nvariables+1)/2<=h<=n. The program then reports the MCD's breakdown value
60  (n-h+1)/n. If you are sure that the dataset contains less than 25% contamination
61  which is usually the case, a good compromise between breakdown value and
62  efficiency is obtained by putting h=[.75*n].
63 2. If h=n,the MCD location estimate is the average of the whole dataset, and
64  the MCD scatter estimate is its covariance matrix. Report this and stop
65 3. If nvariables=1 (univariate data), compute the MCD estimate by the exact
66  algorithm of Rousseeuw and Leroy (1987, pp.171-172) in O(nlogn)time and stop
67 4. From here on, h<n and nvariables>=2.
68  1. If n is small:
69  - repeat (say) 500 times:
70  - construct an initial h-subset, starting from a random (nvar+1)-subset
71  - carry out 2 C-steps (described in the comments of CStep function)
72  - for the 10 results with lowest det(S):
73  - carry out C-steps until convergence
74  - report the solution (T, S) with the lowest det(S)
75  2. If n is larger (say, n>600), then
76  - construct up to 5 disjoint random subsets of size nsub (say, nsub=300)
77  - inside each subset repeat 500/5 times:
78  - construct an initial subset of size hsub=[nsub*h/n]
79  - carry out 2 C-steps
80  - keep the best 10 results (Tsub, Ssub)
81  - pool the subsets, yielding the merged set (say, of size nmerged=1500)
82  - in the merged set, repeat for each of the 50 solutions (Tsub, Ssub)
83  - carry out 2 C-steps
84  - keep the 10 best results
85  - in the full dataset, repeat for those best results:
86  - take several C-steps, using n and h
87  - report the best final result (T, S)
88 5. To obtain consistency when the data comes from a multivariate normal
89  distribution, covariance matrix is multiplied by a correction factor
90 6. Robust distances for all elements, using the final (T, S) are calculated
91  Then the very final mean and covariance estimates are calculated only for
92  values, whose robust distances are less than a cutoff value (0.975 quantile
93  of chi2 distribution with nvariables degrees of freedom)
94 */
95 
96 #include "TRobustEstimator.h"
97 #include "TRandom.h"
98 #include "TMath.h"
99 #include "TH1D.h"
100 #include "TPaveLabel.h"
101 #include "TDecompChol.h"
102 
104 
105 const Double_t kChiMedian[50]= {
106  0.454937, 1.38629, 2.36597, 3.35670, 4.35146, 5.34812, 6.34581, 7.34412, 8.34283,
107  9.34182, 10.34, 11.34, 12.34, 13.34, 14.34, 15.34, 16.34, 17.34, 18.34, 19.34,
108  20.34, 21.34, 22.34, 23.34, 24.34, 25.34, 26.34, 27.34, 28.34, 29.34, 30.34,
109  31.34, 32.34, 33.34, 34.34, 35.34, 36.34, 37.34, 38.34, 39.34, 40.34,
110  41.34, 42.34, 43.34, 44.34, 45.34, 46.34, 47.34, 48.34, 49.33};
111 
112 const Double_t kChiQuant[50]={
113  5.02389, 7.3776,9.34840,11.1433,12.8325,
114  14.4494,16.0128,17.5346,19.0228,20.4831,21.920,23.337,
115  24.736,26.119,27.488,28.845,30.191,31.526,32.852,34.170,
116  35.479,36.781,38.076,39.364,40.646,41.923,43.194,44.461,
117  45.722,46.979,48.232,49.481,50.725,51.966,53.203,54.437,
118  55.668,56.896,58.120,59.342,60.561,61.777,62.990,64.201,
119  65.410,66.617,67.821,69.022,70.222,71.420};
120 
121 ////////////////////////////////////////////////////////////////////////////////
122 ///this constructor should be used in a univariate case:
123 ///first call this constructor, then - the EvaluateUni(..) function
124 
126 }
127 
128 ////////////////////////////////////////////////////////////////////////////////
129 ///constructor
130 
132  :fMean(nvariables),
133  fCovariance(nvariables),
134  fInvcovariance(nvariables),
135  fCorrelation(nvariables),
136  fRd(nvectors),
137  fSd(nvectors),
138  fOut(1),
139  fHyperplane(nvariables),
140  fData(nvectors, nvariables)
141 {
142  if ((nvectors<=1)||(nvariables<=0)){
143  Error("TRobustEstimator","Not enough vectors or variables");
144  return;
145  }
146  if (nvariables==1){
147  Error("TRobustEstimator","For the univariate case, use the default constructor and EvaluateUni() function");
148  return;
149  }
150 
151  fN=nvectors;
152  fNvar=nvariables;
153  if (hh<(fN+fNvar+1)/2){
154  if (hh>0)
155  Warning("TRobustEstimator","chosen h is too small, default h is taken instead");
156  fH=(fN+fNvar+1)/2;
157  } else
158  fH=hh;
159 
160  fVarTemp=0;
161  fVecTemp=0;
162  fExact=0;
163 }
164 
165 ////////////////////////////////////////////////////////////////////////////////
166 ///adds a column to the data matrix
167 ///it is assumed that the column has size fN
168 ///variable fVarTemp keeps the number of columns l
169 ///already added
170 
172 {
173  if (fVarTemp==fNvar) {
174  fNvar++;
181  }
182  for (Int_t i=0; i<fN; i++) {
183  fData(i, fVarTemp)=col[i];
184  }
185  fVarTemp++;
186 }
187 
188 ////////////////////////////////////////////////////////////////////////////////
189 ///adds a vector to the data matrix
190 ///it is supposed that the vector is of size fNvar
191 
193 {
194  if(fVecTemp==fN) {
195  fN++;
196  fRd.ResizeTo(fN);
197  fSd.ResizeTo(fN);
199  }
200  for (Int_t i=0; i<fNvar; i++)
201  fData(fVecTemp, i)=row[i];
202 
203  fVecTemp++;
204 }
205 
206 ////////////////////////////////////////////////////////////////////////////////
207 ///Finds the estimate of multivariate mean and variance
208 
210 {
211  Double_t kEps=1e-14;
212 
213  if (fH==fN){
214  Warning("Evaluate","Chosen h = #observations, so classic estimates of location and scatter will be calculated");
215  Classic();
216  return;
217  }
218 
219  Int_t i, j, k;
220  Int_t ii, jj;
221  Int_t nmini = 300;
222  Int_t k1=500;
223  Int_t nbest=10;
224  TMatrixD sscp(fNvar+1, fNvar+1);
225  TVectorD vec(fNvar);
226 
227  Int_t *index = new Int_t[fN];
228  Double_t *ndist = new Double_t[fN];
229  Double_t det;
230  Double_t *deti=new Double_t[nbest];
231  for (i=0; i<nbest; i++)
232  deti[i]=1e16;
233 
234  for (i=0; i<fN; i++)
235  fRd(i)=0;
236  ////////////////////////////
237  //for small n
238  ////////////////////////////
239  if (fN<nmini*2) {
240  //for storing the best fMeans and covariances
241 
242  TMatrixD mstock(nbest, fNvar);
243  TMatrixD cstock(fNvar, fNvar*nbest);
244 
245  for (k=0; k<k1; k++) {
246  CreateSubset(fN, fH, fNvar, index, fData, sscp, ndist);
247  //calculate the mean and covariance of the created subset
248  ClearSscp(sscp);
249  for (i=0; i<fH; i++) {
250  for(j=0; j<fNvar; j++)
251  vec(j)=fData[index[i]][j];
252  AddToSscp(sscp, vec);
253  }
254  Covar(sscp, fMean, fCovariance, fSd, fH);
255  det = fCovariance.Determinant();
256  if (det < kEps) {
257  fExact = Exact(ndist);
258  delete [] index;
259  delete [] ndist;
260  delete [] deti;
261  return;
262  }
263  //make 2 CSteps
264  det = CStep(fN, fH, index, fData, sscp, ndist);
265  if (det < kEps) {
266  fExact = Exact(ndist);
267  delete [] index;
268  delete [] ndist;
269  delete [] deti;
270  return;
271  }
272  det = CStep(fN, fH, index, fData, sscp, ndist);
273  if (det < kEps) {
274  fExact = Exact(ndist);
275  delete [] index;
276  delete [] ndist;
277  delete [] deti;
278  return;
279  } else {
280  Int_t maxind=TMath::LocMax(nbest, deti);
281  if(det<deti[maxind]) {
282  deti[maxind]=det;
283  for(ii=0; ii<fNvar; ii++) {
284  mstock(maxind, ii)=fMean(ii);
285  for(jj=0; jj<fNvar; jj++)
286  cstock(ii, jj+maxind*fNvar)=fCovariance(ii, jj);
287  }
288  }
289  }
290  }
291 
292  //now for nbest best results perform CSteps until convergence
293 
294  for (i=0; i<nbest; i++) {
295  for(ii=0; ii<fNvar; ii++) {
296  fMean(ii)=mstock(i, ii);
297  for (jj=0; jj<fNvar; jj++)
298  fCovariance(ii, jj)=cstock(ii, jj+i*fNvar);
299  }
300 
301  det=1;
302  while (det>kEps) {
303  det=CStep(fN, fH, index, fData, sscp, ndist);
304  if(TMath::Abs(det-deti[i])<kEps)
305  break;
306  else
307  deti[i]=det;
308  }
309  for(ii=0; ii<fNvar; ii++) {
310  mstock(i,ii)=fMean(ii);
311  for (jj=0; jj<fNvar; jj++)
312  cstock(ii,jj+i*fNvar)=fCovariance(ii, jj);
313  }
314  }
315 
316  Int_t detind=TMath::LocMin(nbest, deti);
317  for(ii=0; ii<fNvar; ii++) {
318  fMean(ii)=mstock(detind,ii);
319 
320  for(jj=0; jj<fNvar; jj++)
321  fCovariance(ii, jj)=cstock(ii,jj+detind*fNvar);
322  }
323 
324  if (deti[detind]!=0) {
325  //calculate robust distances and throw out the bad points
326  Int_t nout = RDist(sscp);
327  Double_t cutoff=kChiQuant[fNvar-1];
328 
329  fOut.Set(nout);
330 
331  j=0;
332  for (i=0; i<fN; i++) {
333  if(fRd(i)>cutoff) {
334  fOut[j]=i;
335  j++;
336  }
337  }
338 
339  } else {
340  fExact=Exact(ndist);
341  }
342  delete [] index;
343  delete [] ndist;
344  delete [] deti;
345  return;
346 
347  }
348  /////////////////////////////////////////////////
349  //if n>nmini, the dataset should be partitioned
350  //partitioning
351  ////////////////////////////////////////////////
352  Int_t indsubdat[5];
353  Int_t nsub;
354  for (ii=0; ii<5; ii++)
355  indsubdat[ii]=0;
356 
357  nsub = Partition(nmini, indsubdat);
358 
359  Int_t sum=0;
360  for (ii=0; ii<5; ii++)
361  sum+=indsubdat[ii];
362  Int_t *subdat=new Int_t[sum];
363  //printf("allocates subdat[ %d ]\n", sum);
364  // init the subdat matrix
365  for (int iii = 0; iii < sum; ++iii) subdat[iii] = -999;
366  RDraw(subdat, nsub, indsubdat);
367  for (int iii = 0; iii < sum; ++iii) {
368  if (subdat[iii] < 0 || subdat[iii] >= fN ) {
369  Error("Evaluate","subdat index is invalid subdat[%d] = %d",iii, subdat[iii] );
370  R__ASSERT(0);
371  }
372  }
373  //now the indexes of selected cases are in the array subdat
374  //matrices to store best means and covariances
375  Int_t nbestsub=nbest*nsub;
376  TMatrixD mstockbig(nbestsub, fNvar);
377  TMatrixD cstockbig(fNvar, fNvar*nbestsub);
378  TMatrixD hyperplane(nbestsub, fNvar);
379  for (i=0; i<nbestsub; i++) {
380  for(j=0; j<fNvar; j++)
381  hyperplane(i,j)=0;
382  }
383  Double_t *detibig = new Double_t[nbestsub];
384  Int_t maxind;
385  maxind=TMath::LocMax(5, indsubdat);
386  TMatrixD dattemp(indsubdat[maxind], fNvar);
387 
388  Int_t k2=Int_t(k1/nsub);
389  //construct h-subsets and perform 2 CSteps in subgroups
390 
391  for (Int_t kgroup=0; kgroup<nsub; kgroup++) {
392  //printf("group #%d\n", kgroup);
393  Int_t ntemp=indsubdat[kgroup];
394  Int_t temp=0;
395  for (i=0; i<kgroup; i++)
396  temp+=indsubdat[i];
397  Int_t par;
398 
399 
400  for(i=0; i<ntemp; i++) {
401  for (j=0; j<fNvar; j++) {
402  dattemp(i,j)=fData[subdat[temp+i]][j];
403  }
404  }
405  Int_t htemp=Int_t(fH*ntemp/fN);
406 
407  for (i=0; i<nbest; i++)
408  deti[i]=1e16;
409 
410  for(k=0; k<k2; k++) {
411  CreateSubset(ntemp, htemp, fNvar, index, dattemp, sscp, ndist);
412  ClearSscp(sscp);
413  for (i=0; i<htemp; i++) {
414  for(j=0; j<fNvar; j++) {
415  vec(j)=dattemp(index[i],j);
416  }
417  AddToSscp(sscp, vec);
418  }
419  Covar(sscp, fMean, fCovariance, fSd, htemp);
420  det = fCovariance.Determinant();
421  if (det<kEps) {
422  par =Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
423  if(par==nbest+1) {
424 
425  delete [] detibig;
426  delete [] deti;
427  delete [] subdat;
428  delete [] ndist;
429  delete [] index;
430  return;
431  } else
432  deti[par]=det;
433  } else {
434  det = CStep(ntemp, htemp, index, dattemp, sscp, ndist);
435  if (det<kEps) {
436  par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp, ndist);
437  if(par==nbest+1) {
438 
439  delete [] detibig;
440  delete [] deti;
441  delete [] subdat;
442  delete [] ndist;
443  delete [] index;
444  return;
445  } else
446  deti[par]=det;
447  } else {
448  det=CStep(ntemp,htemp, index, dattemp, sscp, ndist);
449  if(det<kEps){
450  par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
451  if(par==nbest+1) {
452 
453  delete [] detibig;
454  delete [] deti;
455  delete [] subdat;
456  delete [] ndist;
457  delete [] index;
458  return;
459  } else {
460  deti[par]=det;
461  }
462  } else {
463  maxind=TMath::LocMax(nbest, deti);
464  if(det<deti[maxind]) {
465  deti[maxind]=det;
466  for(i=0; i<fNvar; i++) {
467  mstockbig(nbest*kgroup+maxind,i)=fMean(i);
468  for(j=0; j<fNvar; j++) {
469  cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
470 
471  }
472  }
473  }
474 
475  }
476  }
477  }
478 
479  maxind=TMath::LocMax(nbest, deti);
480  if (deti[maxind]<kEps)
481  break;
482  }
483 
484 
485  for(i=0; i<nbest; i++) {
486  detibig[kgroup*nbest + i]=deti[i];
487 
488  }
489 
490  }
491 
492  //now the arrays mstockbig and cstockbig store nbest*nsub best means and covariances
493  //detibig stores nbest*nsub their determinants
494  //merge the subsets and carry out 2 CSteps on the merged set for all 50 best solutions
495 
496  TMatrixD datmerged(sum, fNvar);
497  for(i=0; i<sum; i++) {
498  for (j=0; j<fNvar; j++)
499  datmerged(i,j)=fData[subdat[i]][j];
500  }
501  // printf("performing calculations for merged set\n");
502  Int_t hmerged=Int_t(sum*fH/fN);
503 
504  Int_t nh;
505  for(k=0; k<nbestsub; k++) {
506  //for all best solutions perform 2 CSteps and then choose the very best
507  for(ii=0; ii<fNvar; ii++) {
508  fMean(ii)=mstockbig(k,ii);
509  for(jj=0; jj<fNvar; jj++)
510  fCovariance(ii, jj)=cstockbig(ii,k*fNvar+jj);
511  }
512  if(detibig[k]==0) {
513  for(i=0; i<fNvar; i++)
514  fHyperplane(i)=hyperplane(k,i);
515  CreateOrtSubset(datmerged,index, hmerged, sum, sscp, ndist);
516 
517  }
518  det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
519  if (det<kEps) {
520  nh= Exact(ndist);
521  if (nh>=fH) {
522  fExact = nh;
523 
524  delete [] detibig;
525  delete [] deti;
526  delete [] subdat;
527  delete [] ndist;
528  delete [] index;
529  return;
530  } else {
531  CreateOrtSubset(datmerged, index, hmerged, sum, sscp, ndist);
532  }
533  }
534 
535  det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
536  if (det<kEps) {
537  nh=Exact(ndist);
538  if (nh>=fH) {
539  fExact = nh;
540  delete [] detibig;
541  delete [] deti;
542  delete [] subdat;
543  delete [] ndist;
544  delete [] index;
545  return;
546  }
547  }
548  detibig[k]=det;
549  for(i=0; i<fNvar; i++) {
550  mstockbig(k,i)=fMean(i);
551  for(j=0; j<fNvar; j++) {
552  cstockbig(i,k*fNvar+j)=fCovariance(i, j);
553  }
554  }
555  }
556  //now for the subset with the smallest determinant
557  //repeat CSteps until convergence
558  Int_t minind=TMath::LocMin(nbestsub, detibig);
559  det=detibig[minind];
560  for(i=0; i<fNvar; i++) {
561  fMean(i)=mstockbig(minind,i);
562  fHyperplane(i)=hyperplane(minind,i);
563  for(j=0; j<fNvar; j++)
564  fCovariance(i, j)=cstockbig(i,minind*fNvar + j);
565  }
566  if(det<kEps)
567  CreateOrtSubset(fData, index, fH, fN, sscp, ndist);
568  det=1;
569  while (det>kEps) {
570  det=CStep(fN, fH, index, fData, sscp, ndist);
571  if(TMath::Abs(det-detibig[minind])<kEps) {
572  break;
573  } else {
574  detibig[minind]=det;
575  }
576  }
577  if(det<kEps) {
578  Exact(ndist);
579  fExact=kTRUE;
580  }
581  Int_t nout = RDist(sscp);
582  Double_t cutoff=kChiQuant[fNvar-1];
583 
584  fOut.Set(nout);
585 
586  j=0;
587  for (i=0; i<fN; i++) {
588  if(fRd(i)>cutoff) {
589  fOut[j]=i;
590  j++;
591  }
592  }
593 
594  delete [] detibig;
595  delete [] deti;
596  delete [] subdat;
597  delete [] ndist;
598  delete [] index;
599  return;
600 }
601 
602 ////////////////////////////////////////////////////////////////////////////////
603 ///for the univariate case
604 ///estimates of location and scatter are returned in mean and sigma parameters
605 ///the algorithm works on the same principle as in multivariate case -
606 ///it finds a subset of size hh with smallest sigma, and then returns mean and
607 ///sigma of this subset
608 
610 {
611  if (hh==0)
612  hh=(nvectors+2)/2;
613  Double_t faclts[]={2.6477,2.5092,2.3826,2.2662,2.1587,2.0589,1.9660,1.879,1.7973,1.7203,1.6473};
614  Int_t *index=new Int_t[nvectors];
615  TMath::Sort(nvectors, data, index, kFALSE);
616 
617  Int_t nquant;
618  nquant=TMath::Min(Int_t(Double_t(((hh*1./nvectors)-0.5)*40))+1, 11);
619  Double_t factor=faclts[nquant-1];
620 
621  Double_t *aw=new Double_t[nvectors];
622  Double_t *aw2=new Double_t[nvectors];
623  Double_t sq=0;
624  Double_t sqmin=0;
625  Int_t ndup=0;
626  Int_t len=nvectors-hh;
627  Double_t *slutn=new Double_t[len];
628  for(Int_t i=0; i<len; i++)
629  slutn[i]=0;
630  for(Int_t jint=0; jint<len; jint++) {
631  aw[jint]=0;
632  for (Int_t j=0; j<hh; j++) {
633  aw[jint]+=data[index[j+jint]];
634  if(jint==0)
635  sq+=data[index[j]]*data[index[j]];
636  }
637  aw2[jint]=aw[jint]*aw[jint]/hh;
638 
639  if(jint==0) {
640  sq=sq-aw2[jint];
641  sqmin=sq;
642  slutn[ndup]=aw[jint];
643 
644  } else {
645  sq=sq - data[index[jint-1]]*data[index[jint-1]]+
646  data[index[jint+hh]]*data[index[jint+hh]]-
647  aw2[jint]+aw2[jint-1];
648  if(sq<sqmin) {
649  ndup=0;
650  sqmin=sq;
651  slutn[ndup]=aw[jint];
652 
653  } else {
654  if(sq==sqmin) {
655  ndup++;
656  slutn[ndup]=aw[jint];
657  }
658  }
659  }
660  }
661 
662  slutn[0]=slutn[Int_t((ndup)/2)]/hh;
663  Double_t bstd=factor*TMath::Sqrt(sqmin/hh);
664  mean=slutn[0];
665  sigma=bstd;
666  delete [] aw;
667  delete [] aw2;
668  delete [] slutn;
669  delete [] index;
670 }
671 
672 ////////////////////////////////////////////////////////////////////////////////
673 ///returns the breakdown point of the algorithm
674 
676 {
677  Int_t n;
678  n=(fN-fH+1)/fN;
679  return n;
680 }
681 
682 ////////////////////////////////////////////////////////////////////////////////
683 ///returns the chi2 quantiles
684 
686 {
687  if (i < 0 || i >= 50) return 0;
688  return kChiQuant[i];
689 }
690 
691 ////////////////////////////////////////////////////////////////////////////////
692 ///returns the covariance matrix
693 
695 {
696  if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar){
697  Warning("GetCovariance","provided matrix is of the wrong size, it will be resized");
698  matr.ResizeTo(fNvar, fNvar);
699  }
700  matr=fCovariance;
701 }
702 
703 ////////////////////////////////////////////////////////////////////////////////
704 ///returns the correlation matrix
705 
707 {
708  if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar) {
709  Warning("GetCorrelation","provided matrix is of the wrong size, it will be resized");
710  matr.ResizeTo(fNvar, fNvar);
711  }
712  matr=fCorrelation;
713 }
714 
715 ////////////////////////////////////////////////////////////////////////////////
716 ///if the points are on a hyperplane, returns this hyperplane
717 
719 {
720  if (fExact==0) {
721  Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
722  return 0;
723  } else {
724  return &fHyperplane;
725  }
726 }
727 
728 ////////////////////////////////////////////////////////////////////////////////
729 ///if the points are on a hyperplane, returns this hyperplane
730 
732 {
733  if (fExact==0){
734  Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
735  return;
736  }
737  if (vec.GetNoElements()!=fNvar) {
738  Warning("GetHyperPlane","provided vector is of the wrong size, it will be resized");
739  vec.ResizeTo(fNvar);
740  }
741  vec=fHyperplane;
742 }
743 
744 ////////////////////////////////////////////////////////////////////////////////
745 ///return the estimate of the mean
746 
748 {
749  if (means.GetNoElements()!=fNvar) {
750  Warning("GetMean","provided vector is of the wrong size, it will be resized");
751  means.ResizeTo(fNvar);
752  }
753  means=fMean;
754 }
755 
756 ////////////////////////////////////////////////////////////////////////////////
757 ///returns the robust distances (helps to find outliers)
758 
760 {
761  if (rdist.GetNoElements()!=fN) {
762  Warning("GetRDistances","provided vector is of the wrong size, it will be resized");
763  rdist.ResizeTo(fN);
764  }
765  rdist=fRd;
766 }
767 
768 ////////////////////////////////////////////////////////////////////////////////
769 ///returns the number of outliers
770 
772 {
773  return fOut.GetSize();
774 }
775 
776 ////////////////////////////////////////////////////////////////////////////////
777 ///update the sscp matrix with vector vec
778 
780 {
781  Int_t i, j;
782  for (j=1; j<fNvar+1; j++) {
783  sscp(0, j) +=vec(j-1);
784  sscp(j, 0) = sscp(0, j);
785  }
786  for (i=1; i<fNvar+1; i++) {
787  for (j=1; j<fNvar+1; j++) {
788  sscp(i, j) += vec(i-1)*vec(j-1);
789  }
790  }
791 }
792 
793 ////////////////////////////////////////////////////////////////////////////////
794 ///clear the sscp matrix, used for covariance and mean calculation
795 
797 {
798  for (Int_t i=0; i<fNvar+1; i++) {
799  for (Int_t j=0; j<fNvar+1; j++) {
800  sscp(i, j)=0;
801  }
802  }
803 }
804 
805 ////////////////////////////////////////////////////////////////////////////////
806 ///called when h=n. Returns classic covariance matrix
807 ///and mean
808 
810 {
811  TMatrixD sscp(fNvar+1, fNvar+1);
812  TVectorD temp(fNvar);
813  ClearSscp(sscp);
814  for (Int_t i=0; i<fN; i++) {
815  for (Int_t j=0; j<fNvar; j++)
816  temp(j)=fData(i, j);
817  AddToSscp(sscp, temp);
818  }
819  Covar(sscp, fMean, fCovariance, fSd, fN);
820  Correl();
821 
822 }
823 
824 ////////////////////////////////////////////////////////////////////////////////
825 ///calculates mean and covariance
826 
828 {
829  Int_t i, j;
830  Double_t f;
831  for (i=0; i<fNvar; i++) {
832  m(i)=sscp(0, i+1);
833  sd[i]=sscp(i+1, i+1);
834  f=(sd[i]-m(i)*m(i)/nvec)/(nvec-1);
835  if (f>1e-14) sd[i]=TMath::Sqrt(f);
836  else sd[i]=0;
837  m(i)/=nvec;
838  }
839  for (i=0; i<fNvar; i++) {
840  for (j=0; j<fNvar; j++) {
841  cov(i, j)=sscp(i+1, j+1)-nvec*m(i)*m(j);
842  cov(i, j)/=nvec-1;
843  }
844  }
845 }
846 
847 ////////////////////////////////////////////////////////////////////////////////
848 ///transforms covariance matrix into correlation matrix
849 
851 {
852  Int_t i, j;
853  Double_t *sd=new Double_t[fNvar];
854  for(j=0; j<fNvar; j++)
855  sd[j]=1./TMath::Sqrt(fCovariance(j, j));
856  for(i=0; i<fNvar; i++) {
857  for (j=0; j<fNvar; j++) {
858  if (i==j)
859  fCorrelation(i, j)=1.;
860  else
861  fCorrelation(i, j)=fCovariance(i, j)*sd[i]*sd[j];
862  }
863  }
864  delete [] sd;
865 }
866 
867 ////////////////////////////////////////////////////////////////////////////////
868 ///creates a subset of htotal elements from ntotal elements
869 ///first, p+1 elements are drawn randomly(without repetitions)
870 ///if their covariance matrix is singular, more elements are
871 ///added one by one, until their covariance matrix becomes regular
872 ///or it becomes clear that htotal observations lie on a hyperplane
873 ///If covariance matrix determinant!=0, distances of all ntotal elements
874 ///are calculated, using formula d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where
875 ///M is mean and S_inv is the inverse of the covariance matrix
876 ///htotal points with smallest distances are included in the returned subset.
877 
878 void TRobustEstimator::CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
879 {
880  Double_t kEps = 1e-14;
881  Int_t i, j;
882  Bool_t repeat=kFALSE;
883  Int_t nindex=0;
884  Int_t num;
885  for(i=0; i<ntotal; i++)
886  index[i]=ntotal+1;
887 
888  for (i=0; i<p+1; i++) {
889  num=Int_t(gRandom->Uniform(0, 1)*(ntotal-1));
890  if (i>0){
891  for(j=0; j<=i-1; j++) {
892  if(index[j]==num)
893  repeat=kTRUE;
894  }
895  }
896  if(repeat==kTRUE) {
897  i--;
898  repeat=kFALSE;
899  } else {
900  index[i]=num;
901  nindex++;
902  }
903  }
904 
905  ClearSscp(sscp);
906 
907  TVectorD vec(fNvar);
908  Double_t det;
909  for (i=0; i<p+1; i++) {
910  for (j=0; j<fNvar; j++) {
911  vec[j]=data[index[i]][j];
912 
913  }
914  AddToSscp(sscp, vec);
915  }
916 
917  Covar(sscp, fMean, fCovariance, fSd, p+1);
918  det=fCovariance.Determinant();
919  while((det<kEps)&&(nindex < htotal)) {
920  //if covariance matrix is singular,another vector is added until
921  //the matrix becomes regular or it becomes clear that all
922  //vectors of the group lie on a hyperplane
923  repeat=kFALSE;
924  do{
925  num=Int_t(gRandom->Uniform(0,1)*(ntotal-1));
926  repeat=kFALSE;
927  for(i=0; i<nindex; i++) {
928  if(index[i]==num) {
929  repeat=kTRUE;
930  break;
931  }
932  }
933  }while(repeat==kTRUE);
934 
935  index[nindex]=num;
936  nindex++;
937  //check if covariance matrix is singular
938  for(j=0; j<fNvar; j++)
939  vec[j]=data[index[nindex-1]][j];
940  AddToSscp(sscp, vec);
941  Covar(sscp, fMean, fCovariance, fSd, nindex);
942  det=fCovariance.Determinant();
943  }
944 
945  if(nindex!=htotal) {
946  TDecompChol chol(fCovariance);
947  fInvcovariance = chol.Invert();
948 
949  TVectorD temp(fNvar);
950  for(j=0; j<ntotal; j++) {
951  ndist[j]=0;
952  for(i=0; i<fNvar; i++)
953  temp[i]=data[j][i] - fMean(i);
954  temp*=fInvcovariance;
955  for(i=0; i<fNvar; i++)
956  ndist[j]+=(data[j][i]-fMean(i))*temp[i];
957  }
958  KOrdStat(ntotal, ndist, htotal-1,index);
959  }
960 
961 }
962 
963 ////////////////////////////////////////////////////////////////////////////////
964 ///creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane
965 ///hyp[1]*(x1-mean[1])+...+hyp[nvar]*(xnvar-mean[nvar])=0
966 ///This function is called in case when less than fH samples lie on a hyperplane.
967 
968 void TRobustEstimator::CreateOrtSubset(TMatrixD &dat,Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
969 {
970  Int_t i, j;
971  TVectorD vec(fNvar);
972  for (i=0; i<nmerged; i++) {
973  ndist[i]=0;
974  for(j=0; j<fNvar; j++) {
975  ndist[i]+=fHyperplane[j]*(dat[i][j]-fMean[j]);
976  ndist[i]=TMath::Abs(ndist[i]);
977  }
978  }
979  KOrdStat(nmerged, ndist, hmerged-1, index);
980  ClearSscp(sscp);
981  for (i=0; i<hmerged; i++) {
982  for(j=0; j<fNvar; j++)
983  vec[j]=dat[index[i]][j];
984  AddToSscp(sscp, vec);
985  }
986  Covar(sscp, fMean, fCovariance, fSd, hmerged);
987 }
988 
989 ////////////////////////////////////////////////////////////////////////////////
990 ///from the input htotal-subset constructs another htotal subset with lower determinant
991 ///
992 ///As proven by Peter J.Rousseeuw and Katrien Van Driessen, if distances for all elements
993 ///are calculated, using the formula:d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where M is the mean
994 ///of the input htotal-subset, and S_inv - the inverse of its covariance matrix, then
995 ///htotal elements with smallest distances will have covariance matrix with determinant
996 ///less or equal to the determinant of the input subset covariance matrix.
997 ///
998 ///determinant for this htotal-subset with smallest distances is returned
999 
1001 {
1002  Int_t i, j;
1003  TVectorD vec(fNvar);
1004  Double_t det;
1005 
1006  TDecompChol chol(fCovariance);
1007  fInvcovariance = chol.Invert();
1008 
1009  TVectorD temp(fNvar);
1010  for(j=0; j<ntotal; j++) {
1011  ndist[j]=0;
1012  for(i=0; i<fNvar; i++)
1013  temp[i]=data[j][i]-fMean[i];
1014  temp*=fInvcovariance;
1015  for(i=0; i<fNvar; i++)
1016  ndist[j]+=(data[j][i]-fMean[i])*temp[i];
1017  }
1018 
1019  //taking h smallest
1020  KOrdStat(ntotal, ndist, htotal-1, index);
1021  //writing their mean and covariance
1022  ClearSscp(sscp);
1023  for (i=0; i<htotal; i++) {
1024  for (j=0; j<fNvar; j++)
1025  temp[j]=data[index[i]][j];
1026  AddToSscp(sscp, temp);
1027  }
1028  Covar(sscp, fMean, fCovariance, fSd, htotal);
1029  det = fCovariance.Determinant();
1030  return det;
1031 }
1032 
1033 ////////////////////////////////////////////////////////////////////////////////
1034 ///for the exact fit situations
1035 ///returns number of observations on the hyperplane
1036 
1038 {
1039  Int_t i, j;
1040 
1042  TVectorD eigenValues=eigen.GetEigenValues();
1043  TMatrixD eigenMatrix=eigen.GetEigenVectors();
1044 
1045  for (j=0; j<fNvar; j++) {
1046  fHyperplane[j]=eigenMatrix(j,fNvar-1);
1047  }
1048  //calculate and return how many observations lie on the hyperplane
1049  for (i=0; i<fN; i++) {
1050  ndist[i]=0;
1051  for(j=0; j<fNvar; j++) {
1052  ndist[i]+=fHyperplane[j]*(fData[i][j]-fMean[j]);
1053  ndist[i]=TMath::Abs(ndist[i]);
1054  }
1055  }
1056  Int_t nhyp=0;
1057 
1058  for (i=0; i<fN; i++) {
1059  if(ndist[i] < 1e-14) nhyp++;
1060  }
1061  return nhyp;
1062 
1063 }
1064 
1065 ////////////////////////////////////////////////////////////////////////////////
1066 ///This function is called if determinant of the covariance matrix of a subset=0.
1067 ///
1068 ///If there are more then fH vectors on a hyperplane,
1069 ///returns this hyperplane and stops
1070 ///else stores the hyperplane coordinates in hyperplane matrix
1071 
1072 Int_t TRobustEstimator::Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane,
1073  Double_t *deti, Int_t nbest, Int_t kgroup,
1074  TMatrixD &sscp, Double_t *ndist)
1075 {
1076  Int_t i, j;
1077 
1078  TVectorD vec(fNvar);
1079  Int_t maxind = TMath::LocMax(nbest, deti);
1080  Int_t nh=Exact(ndist);
1081  //now nh is the number of observation on the hyperplane
1082  //ndist stores distances of observation from this hyperplane
1083  if(nh>=fH) {
1084  ClearSscp(sscp);
1085  for (i=0; i<fN; i++) {
1086  if(ndist[i]<1e-14) {
1087  for (j=0; j<fNvar; j++)
1088  vec[j]=fData[i][j];
1089  AddToSscp(sscp, vec);
1090  }
1091  }
1092  Covar(sscp, fMean, fCovariance, fSd, nh);
1093 
1094  fExact=nh;
1095  return nbest+1;
1096 
1097  } else {
1098  //if less than fH observations lie on a hyperplane,
1099  //mean and covariance matrix are stored in mstockbig
1100  //and cstockbig in place of the previous maximum determinant
1101  //mean and covariance
1102  for(i=0; i<fNvar; i++) {
1103  mstockbig(nbest*kgroup+maxind,i)=fMean(i);
1104  hyperplane(nbest*kgroup+maxind,i)=fHyperplane(i);
1105  for(j=0; j<fNvar; j++) {
1106  cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
1107  }
1108  }
1109  return maxind;
1110  }
1111 }
1112 
1113 
1114 ////////////////////////////////////////////////////////////////////////////////
1115 ///divides the elements into approximately equal subgroups
1116 ///number of elements in each subgroup is stored in indsubdat
1117 ///number of subgroups is returned
1118 
1120 {
1121  Int_t nsub;
1122  if ((fN>=2*nmini) && (fN<=(3*nmini-1))) {
1123  if (fN%2==1){
1124  indsubdat[0]=Int_t(fN*0.5);
1125  indsubdat[1]=Int_t(fN*0.5)+1;
1126  } else
1127  indsubdat[0]=indsubdat[1]=Int_t(fN/2);
1128  nsub=2;
1129  }
1130  else{
1131  if((fN>=3*nmini) && (fN<(4*nmini -1))) {
1132  if(fN%3==0){
1133  indsubdat[0]=indsubdat[1]=indsubdat[2]=Int_t(fN/3);
1134  } else {
1135  indsubdat[0]=Int_t(fN/3);
1136  indsubdat[1]=Int_t(fN/3)+1;
1137  if (fN%3==1) indsubdat[2]=Int_t(fN/3);
1138  else indsubdat[2]=Int_t(fN/3)+1;
1139  }
1140  nsub=3;
1141  }
1142  else{
1143  if((fN>=4*nmini)&&(fN<=(5*nmini-1))){
1144  if (fN%4==0) indsubdat[0]=indsubdat[1]=indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1145  else {
1146  indsubdat[0]=Int_t(fN/4);
1147  indsubdat[1]=Int_t(fN/4)+1;
1148  if(fN%4==1) indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1149  if(fN%4==2) {
1150  indsubdat[2]=Int_t(fN/4)+1;
1151  indsubdat[3]=Int_t(fN/4);
1152  }
1153  if(fN%4==3) indsubdat[2]=indsubdat[3]=Int_t(fN/4)+1;
1154  }
1155  nsub=4;
1156  } else {
1157  for(Int_t i=0; i<5; i++)
1158  indsubdat[i]=nmini;
1159  nsub=5;
1160  }
1161  }
1162  }
1163  return nsub;
1164 }
1165 
1166 ////////////////////////////////////////////////////////////////////////////////
1167 ///Calculates robust distances.Then the samples with robust distances
1168 ///greater than a cutoff value (0.975 quantile of chi2 distribution with
1169 ///fNvar degrees of freedom, multiplied by a correction factor), are given
1170 ///weiht=0, and new, reweighted estimates of location and scatter are calculated
1171 ///The function returns the number of outliers.
1172 
1174 {
1175  Int_t i, j;
1176  Int_t nout=0;
1177 
1178  TVectorD temp(fNvar);
1179  TDecompChol chol(fCovariance);
1180  fInvcovariance = chol.Invert();
1181 
1182 
1183  for (i=0; i<fN; i++) {
1184  fRd[i]=0;
1185  for(j=0; j<fNvar; j++) {
1186  temp[j]=fData[i][j]-fMean[j];
1187  }
1188  temp*=fInvcovariance;
1189  for(j=0; j<fNvar; j++) {
1190  fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1191  }
1192  }
1193 
1194  Double_t med;
1195  Double_t chi = kChiMedian[fNvar-1];
1196 
1197  med=TMath::Median(fN, fRd.GetMatrixArray());
1198  med/=chi;
1199  fCovariance*=med;
1200  TDecompChol chol2(fCovariance);
1201  fInvcovariance = chol2.Invert();
1202 
1203  for (i=0; i<fN; i++) {
1204  fRd[i]=0;
1205  for(j=0; j<fNvar; j++) {
1206  temp[j]=fData[i][j]-fMean[j];
1207  }
1208 
1209  temp*=fInvcovariance;
1210  for(j=0; j<fNvar; j++) {
1211  fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1212  }
1213  }
1214 
1215  Double_t cutoff = kChiQuant[fNvar-1];
1216 
1217  ClearSscp(sscp);
1218  for(i=0; i<fN; i++) {
1219  if (fRd[i]<=cutoff) {
1220  for(j=0; j<fNvar; j++)
1221  temp[j]=fData[i][j];
1222  AddToSscp(sscp,temp);
1223  } else {
1224  nout++;
1225  }
1226  }
1227 
1228  Covar(sscp, fMean, fCovariance, fSd, fN-nout);
1229  return nout;
1230 }
1231 
1232 ////////////////////////////////////////////////////////////////////////////////
1233 ///Draws ngroup nonoverlapping subdatasets out of a dataset of size n
1234 ///such that the selected case numbers are uniformly distributed from 1 to n
1235 
1236 void TRobustEstimator::RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
1237 {
1238  Int_t jndex = 0;
1239  Int_t nrand;
1240  Int_t i, k, m, j;
1241  for (k=1; k<=ngroup; k++) {
1242  for (m=1; m<=indsubdat[k-1]; m++) {
1243  nrand = Int_t(gRandom->Uniform(0, 1) * double(fN-jndex))+1;
1244  //printf("nrand = %d - jndex %d\n",nrand,jndex);
1245  jndex++;
1246  if (jndex==1) {
1247  subdat[0]=nrand-1; // in case nrand is equal to fN
1248  } else {
1249  subdat[jndex-1]=nrand+jndex-2;
1250  for (i=1; i<=jndex-1; i++) {
1251  if(subdat[i-1] > nrand+i-2) {
1252  for(j=jndex; j>=i+1; j--) {
1253  subdat[j-1]=subdat[j-2];
1254  }
1255  //printf("subdata[] i = %d - nrand %d\n",i,nrand);
1256  subdat[i-1]=nrand+i-2;
1257  break; //breaking the loop for(i=1...
1258  }
1259  }
1260  }
1261  }
1262  }
1263 }
1264 
1265 ////////////////////////////////////////////////////////////////////////////////
1266 ///because I need an Int_t work array
1267 
1269  Bool_t isAllocated = kFALSE;
1270  const Int_t kWorkMax=100;
1271  Int_t i, ir, j, l, mid;
1272  Int_t arr;
1273  Int_t *ind;
1274  Int_t workLocal[kWorkMax];
1275  Int_t temp;
1276 
1277 
1278  if (work) {
1279  ind = work;
1280  } else {
1281  ind = workLocal;
1282  if (ntotal > kWorkMax) {
1283  isAllocated = kTRUE;
1284  ind = new Int_t[ntotal];
1285  }
1286  }
1287 
1288  for (Int_t ii=0; ii<ntotal; ii++) {
1289  ind[ii]=ii;
1290  }
1291  Int_t rk = k;
1292  l=0;
1293  ir = ntotal-1;
1294  for(;;) {
1295  if (ir<=l+1) { //active partition contains 1 or 2 elements
1296  if (ir == l+1 && a[ind[ir]]<a[ind[l]])
1297  {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1298  Double_t tmp = a[ind[rk]];
1299  if (isAllocated)
1300  delete [] ind;
1301  return tmp;
1302  } else {
1303  mid = (l+ir) >> 1; //choose median of left, center and right
1304  {temp = ind[mid]; ind[mid]=ind[l+1]; ind[l+1]=temp;}//elements as partitioning element arr.
1305  if (a[ind[l]]>a[ind[ir]]) //also rearrange so that a[l]<=a[l+1]
1306  {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1307 
1308  if (a[ind[l+1]]>a[ind[ir]])
1309  {temp=ind[l+1]; ind[l+1]=ind[ir]; ind[ir]=temp;}
1310 
1311  if (a[ind[l]]>a[ind[l+1]])
1312  {temp = ind[l]; ind[l]=ind[l+1]; ind[l+1]=temp;}
1313 
1314  i=l+1; //initialize pointers for partitioning
1315  j=ir;
1316  arr = ind[l+1];
1317  for (;;) {
1318  do i++; while (a[ind[i]]<a[arr]);
1319  do j--; while (a[ind[j]]>a[arr]);
1320  if (j<i) break; //pointers crossed, partitioning complete
1321  {temp=ind[i]; ind[i]=ind[j]; ind[j]=temp;}
1322  }
1323  ind[l+1]=ind[j];
1324  ind[j]=arr;
1325  if (j>=rk) ir = j-1; //keep active the partition that
1326  if (j<=rk) l=i; //contains the k_th element
1327  }
1328  }
1329 }
void Classic()
called when h=n.
Int_t GetNOut()
returns the number of outliers
double par[1]
Definition: unuranDistr.cxx:38
static long int sum(long int i)
Definition: Factory.cxx:1786
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition: TVectorT.cxx:292
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
void CreateOrtSubset(TMatrixD &dat, Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane hyp[1]*(x1-m...
you should not use this method at all Int_t Int_t Double_t Double_t Double_t Int_t mid
Definition: TRolke.cxx:630
Long64_t LocMax(Long64_t n, const T *a)
Definition: TMath.h:711
void CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
creates a subset of htotal elements from ntotal elements first, p+1 elements are drawn randomly(witho...
const Double_t kChiMedian[50]
void AddColumn(Double_t *col)
adds a column to the data matrix it is assumed that the column has size fN variable fVarTemp keeps th...
void Evaluate()
Finds the estimate of multivariate mean and variance.
Int_t GetNcols() const
Definition: TMatrixTBase.h:137
Int_t Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane, Double_t *deti, Int_t nbest, Int_t kgroup, TMatrixD &sscp, Double_t *ndist)
This function is called if determinant of the covariance matrix of a subset=0.
void ClearSscp(TMatrixD &sscp)
clear the sscp matrix, used for covariance and mean calculation
Double_t CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
from the input htotal-subset constructs another htotal subset with lower determinant ...
#define R__ASSERT(e)
Definition: TError.h:98
Short_t Min(Short_t a, Short_t b)
Definition: TMathBase.h:170
int Int_t
Definition: RtypesCore.h:41
bool Bool_t
Definition: RtypesCore.h:59
TArc * a
Definition: textangle.C:12
const Bool_t kFALSE
Definition: Rtypes.h:92
Double_t KOrdStat(Int_t ntotal, Double_t *arr, Int_t k, Int_t *work)
because I need an Int_t work array
void AddRow(Double_t *row)
adds a vector to the data matrix it is supposed that the vector is of size fNvar
const TMatrixDSym * GetCovariance() const
virtual TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1)
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
Definition: TMatrixT.cxx:1210
Short_t Abs(Short_t d)
Definition: TMathBase.h:110
TMatrixDSymEigen.
const TVectorD * GetMean() const
TMatrixDSym fInvcovariance
Minimum Covariance Determinant Estimator - a Fast Algorithm invented by Peter J.Rousseeuw and Katrien...
Double_t GetChiQuant(Int_t i) const
returns the chi2 quantiles
const Double_t kChiQuant[50]
void Sort(Index n, const Element *a, Index *index, Bool_t down=kTRUE)
Definition: TMath.h:989
Bool_t Invert(TMatrixDSym &inv)
For a symmetric matrix A(m,m), its inverse A_inv(m,m) is returned .
const Double_t sigma
void Set(Int_t n)
Set size of this array to n ints.
Definition: TArrayI.cxx:105
Element * GetMatrixArray()
Definition: TVectorT.h:84
Cholesky Decomposition class.
Definition: TDecompChol.h:28
virtual Double_t Determinant() const
Int_t GetSize() const
Definition: TArray.h:49
void RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
Draws ngroup nonoverlapping subdatasets out of a dataset of size n such that the selected case number...
TMarker * m
Definition: textangle.C:8
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition: TObject.cxx:925
void AddToSscp(TMatrixD &sscp, TVectorD &vec)
update the sscp matrix with vector vec
TLine * l
Definition: textangle.C:4
const TVectorD * GetRDistances() const
Int_t GetNoElements() const
Definition: TVectorT.h:82
Int_t Exact(Double_t *ndist)
for the exact fit situations returns number of observations on the hyperplane
Int_t Partition(Int_t nmini, Int_t *indsubdat)
divides the elements into approximately equal subgroups number of elements in each subgroup is stored...
TMatrixDSym fCovariance
R__EXTERN TRandom * gRandom
Definition: TRandom.h:66
const TMatrixD & GetEigenVectors() const
Int_t GetNrows() const
Definition: TMatrixTBase.h:134
const TVectorD * GetHyperplane() const
if the points are on a hyperplane, returns this hyperplane
const TVectorD & GetEigenValues() const
#define ClassImp(name)
Definition: Rtypes.h:279
double f(double x)
const TMatrixDSym * GetCorrelation() const
double Double_t
Definition: RtypesCore.h:55
Double_t Median(Long64_t n, const T *a, const Double_t *w=0, Long64_t *work=0)
Definition: TMath.h:1064
Int_t RDist(TMatrixD &sscp)
Calculates robust distances.Then the samples with robust distances greater than a cutoff value (0...
void EvaluateUni(Int_t nvectors, Double_t *data, Double_t &mean, Double_t &sigma, Int_t hh=0)
for the univariate case estimates of location and scatter are returned in mean and sigma parameters t...
you should not use this method at all Int_t Int_t Double_t Double_t Double_t e
Definition: TRolke.cxx:630
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition: TRandom.cxx:606
Int_t GetBDPoint()
returns the breakdown point of the algorithm
void Correl()
transforms covariance matrix into correlation matrix
TMatrixDSym fCorrelation
void Covar(TMatrixD &sscp, TVectorD &m, TMatrixDSym &cov, TVectorD &sd, Int_t nvec)
calculates mean and covariance
TRobustEstimator()
this constructor should be used in a univariate case: first call this constructor, then - the EvaluateUni(..) function
Double_t Sqrt(Double_t x)
Definition: TMath.h:464
Long64_t LocMin(Long64_t n, const T *a)
Definition: TMath.h:682
const Bool_t kTRUE
Definition: Rtypes.h:91
const Int_t n
Definition: legend1.C:16
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition: TObject.cxx:911