Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TRobustEstimator.cxx
Go to the documentation of this file.
1// @(#)root/physics:$Id$
2// Author: Anna Kreshuk 08/10/2004
3
4/*************************************************************************
5 * Copyright (C) 1995-2004, Rene Brun and Fons Rademakers. *
6 * All rights reserved. *
7 * *
8 * For the licensing terms see $ROOTSYS/LICENSE. *
9 * For the list of contributors see $ROOTSYS/README/CREDITS. *
10 *************************************************************************/
11
12/** \class TRobustEstimator
13 \ingroup Physics
14Minimum Covariance Determinant Estimator - a Fast Algorithm
15invented by Peter J.Rousseeuw and Katrien Van Dreissen
16"A Fast Algorithm for the Minimum covariance Determinant Estimator"
17Technometrics, August 1999, Vol.41, NO.3
18
19What are robust estimators?
20"An important property of an estimator is its robustness. An estimator
21is called robust if it is insensitive to measurements that deviate
22from the expected behaviour. There are 2 ways to treat such deviating
23measurements: one may either try to recognise them and then remove
24them from the data sample; or one may leave them in the sample, taking
25care that they do not influence the estimate unduly. In both cases robust
26estimators are needed...Robust procedures compensate for systematic errors
27as much as possible, and indicate any situation in which a danger of not being
28able to operate reliably is detected."
29R.Fruhwirth, M.Regler, R.K.Bock, H.Grote, D.Notz
30"Data Analysis Techniques for High-Energy Physics", 2nd edition
31
32What does this algorithm do?
33It computes a highly robust estimator of multivariate location and scatter.
34Then, it takes those estimates to compute robust distances of all the
35data vectors. Those with large robust distances are considered outliers.
36Robust distances can then be plotted for better visualization of the data.
37
38How does this algorithm do it?
39The MCD objective is to find h observations(out of n) whose classical
40covariance matrix has the lowest determinant. The MCD estimator of location
41is then the average of those h points and the MCD estimate of scatter
42is their covariance matrix. The minimum(and default) h = (n+nvariables+1)/2
43so the algorithm is effective when less than (n+nvar+1)/2 variables are outliers.
44The algorithm also allows for exact fit situations - that is, when h or more
45observations lie on a hyperplane. Then the algorithm still yields the MCD location T
46and scatter matrix S, the latter being singular as it should be. From (T,S) the
47program then computes the equation of the hyperplane.
48
49How can this algorithm be used?
50In any case, when contamination of data is suspected, that might influence
51the classical estimates.
52Also, robust estimation of location and scatter is a tool to robustify
53other multivariate techniques such as, for example, principal-component analysis
54and discriminant analysis.
55
56Technical details of the algorithm:
57
581. The default h = (n+nvariables+1)/2, but the user may choose any integer h with
59 (n+nvariables+1)/2<=h<=n. The program then reports the MCD's breakdown value
60 (n-h+1)/n. If you are sure that the dataset contains less than 25% contamination
61 which is usually the case, a good compromise between breakdown value and
62 efficiency is obtained by putting h=[.75*n].
632. If h=n,the MCD location estimate is the average of the whole dataset, and
64 the MCD scatter estimate is its covariance matrix. Report this and stop
653. If nvariables=1 (univariate data), compute the MCD estimate by the exact
66 algorithm of Rousseeuw and Leroy (1987, pp.171-172) in O(nlogn)time and stop
674. From here on, h<n and nvariables>=2.
68 1. If n is small:
69 - repeat (say) 500 times:
70 - construct an initial h-subset, starting from a random (nvar+1)-subset
71 - carry out 2 C-steps (described in the comments of CStep function)
72 - for the 10 results with lowest det(S):
73 - carry out C-steps until convergence
74 - report the solution (T, S) with the lowest det(S)
75 2. If n is larger (say, n>600), then
76 - construct up to 5 disjoint random subsets of size nsub (say, nsub=300)
77 - inside each subset repeat 500/5 times:
78 - construct an initial subset of size hsub=[nsub*h/n]
79 - carry out 2 C-steps
80 - keep the best 10 results (Tsub, Ssub)
81 - pool the subsets, yielding the merged set (say, of size nmerged=1500)
82 - in the merged set, repeat for each of the 50 solutions (Tsub, Ssub)
83 - carry out 2 C-steps
84 - keep the 10 best results
85 - in the full dataset, repeat for those best results:
86 - take several C-steps, using n and h
87 - report the best final result (T, S)
885. To obtain consistency when the data comes from a multivariate normal
89 distribution, covariance matrix is multiplied by a correction factor
906. Robust distances for all elements, using the final (T, S) are calculated
91 Then the very final mean and covariance estimates are calculated only for
92 values, whose robust distances are less than a cutoff value (0.975 quantile
93 of chi2 distribution with nvariables degrees of freedom)
94*/
95
96#include "TRobustEstimator.h"
97#include "TMatrixDSymEigen.h"
98#include "TRandom.h"
99#include "TMath.h"
100#include "TDecompChol.h"
101
103
105 0.454937, 1.38629, 2.36597, 3.35670, 4.35146, 5.34812, 6.34581, 7.34412, 8.34283,
106 9.34182, 10.34, 11.34, 12.34, 13.34, 14.34, 15.34, 16.34, 17.34, 18.34, 19.34,
107 20.34, 21.34, 22.34, 23.34, 24.34, 25.34, 26.34, 27.34, 28.34, 29.34, 30.34,
108 31.34, 32.34, 33.34, 34.34, 35.34, 36.34, 37.34, 38.34, 39.34, 40.34,
109 41.34, 42.34, 43.34, 44.34, 45.34, 46.34, 47.34, 48.34, 49.33};
110
112 5.02389, 7.3776,9.34840,11.1433,12.8325,
113 14.4494,16.0128,17.5346,19.0228,20.4831,21.920,23.337,
114 24.736,26.119,27.488,28.845,30.191,31.526,32.852,34.170,
115 35.479,36.781,38.076,39.364,40.646,41.923,43.194,44.461,
116 45.722,46.979,48.232,49.481,50.725,51.966,53.203,54.437,
117 55.668,56.896,58.120,59.342,60.561,61.777,62.990,64.201,
118 65.410,66.617,67.821,69.022,70.222,71.420};
119
120////////////////////////////////////////////////////////////////////////////////
121///this constructor should be used in a univariate case:
122///first call this constructor, then - the EvaluateUni(..) function
123
125}
126
127////////////////////////////////////////////////////////////////////////////////
128///constructor
129
131 :fMean(nvariables),
132 fCovariance(nvariables),
133 fInvcovariance(nvariables),
134 fCorrelation(nvariables),
135 fRd(nvectors),
136 fSd(nvectors),
137 fOut(1),
138 fHyperplane(nvariables),
139 fData(nvectors, nvariables)
140{
141 if ((nvectors<=1)||(nvariables<=0)){
142 Error("TRobustEstimator","Not enough vectors or variables");
143 return;
144 }
145 if (nvariables==1){
146 Error("TRobustEstimator","For the univariate case, use the default constructor and EvaluateUni() function");
147 return;
148 }
149
150 fN=nvectors;
151 fNvar=nvariables;
152 if (hh<(fN+fNvar+1)/2){
153 if (hh>0)
154 Warning("TRobustEstimator","chosen h is too small, default h is taken instead");
155 fH=(fN+fNvar+1)/2;
156 } else
157 fH=hh;
158
159 fVarTemp=0;
160 fVecTemp=0;
161 fExact=0;
162}
163
164////////////////////////////////////////////////////////////////////////////////
165///adds a column to the data matrix
166///it is assumed that the column has size fN
167///variable fVarTemp keeps the number of columns l
168///already added
169
171{
172 if (fVarTemp==fNvar) {
173 fNvar++;
180 }
181 for (Int_t i=0; i<fN; i++) {
182 fData(i, fVarTemp)=col[i];
183 }
184 fVarTemp++;
185}
186
187////////////////////////////////////////////////////////////////////////////////
188///adds a vector to the data matrix
189///it is supposed that the vector is of size fNvar
190
192{
193 if(fVecTemp==fN) {
194 fN++;
195 fRd.ResizeTo(fN);
196 fSd.ResizeTo(fN);
198 }
199 for (Int_t i=0; i<fNvar; i++)
200 fData(fVecTemp, i)=row[i];
201
202 fVecTemp++;
203}
204
205////////////////////////////////////////////////////////////////////////////////
206///Finds the estimate of multivariate mean and variance
207
209{
210 Double_t kEps=1e-14;
211
212 if (fH==fN){
213 Warning("Evaluate","Chosen h = #observations, so classic estimates of location and scatter will be calculated");
214 Classic();
215 return;
216 }
217
218 Int_t i, j, k;
219 Int_t ii, jj;
220 Int_t nmini = 300;
221 Int_t k1=500;
222 Int_t nbest=10;
223 TMatrixD sscp(fNvar+1, fNvar+1);
225
226 Int_t *index = new Int_t[fN];
227 Double_t *ndist = new Double_t[fN];
228 Double_t det;
229 Double_t *deti=new Double_t[nbest];
230 for (i=0; i<nbest; i++)
231 deti[i]=1e16;
232
233 for (i=0; i<fN; i++)
234 fRd(i)=0;
235 ////////////////////////////
236 //for small n
237 ////////////////////////////
238 if (fN<nmini*2) {
239 //for storing the best fMeans and covariances
240
241 TMatrixD mstock(nbest, fNvar);
242 TMatrixD cstock(fNvar, fNvar*nbest);
243
244 for (k=0; k<k1; k++) {
245 CreateSubset(fN, fH, fNvar, index, fData, sscp, ndist);
246 //calculate the mean and covariance of the created subset
247 ClearSscp(sscp);
248 for (i=0; i<fH; i++) {
249 for(j=0; j<fNvar; j++)
250 vec(j)=fData[index[i]][j];
251 AddToSscp(sscp, vec);
252 }
253 Covar(sscp, fMean, fCovariance, fSd, fH);
254 det = fCovariance.Determinant();
255 if (det < kEps) {
256 fExact = Exact(ndist);
257 delete [] index;
258 delete [] ndist;
259 delete [] deti;
260 return;
261 }
262 //make 2 CSteps
263 det = CStep(fN, fH, index, fData, sscp, ndist);
264 if (det < kEps) {
265 fExact = Exact(ndist);
266 delete [] index;
267 delete [] ndist;
268 delete [] deti;
269 return;
270 }
271 det = CStep(fN, fH, index, fData, sscp, ndist);
272 if (det < kEps) {
273 fExact = Exact(ndist);
274 delete [] index;
275 delete [] ndist;
276 delete [] deti;
277 return;
278 } else {
279 Int_t maxind=TMath::LocMax(nbest, deti);
280 if(det<deti[maxind]) {
281 deti[maxind]=det;
282 for(ii=0; ii<fNvar; ii++) {
283 mstock(maxind, ii)=fMean(ii);
284 for(jj=0; jj<fNvar; jj++)
285 cstock(ii, jj+maxind*fNvar)=fCovariance(ii, jj);
286 }
287 }
288 }
289 }
290
291 //now for nbest best results perform CSteps until convergence
292
293 for (i=0; i<nbest; i++) {
294 for(ii=0; ii<fNvar; ii++) {
295 fMean(ii)=mstock(i, ii);
296 for (jj=0; jj<fNvar; jj++)
297 fCovariance(ii, jj)=cstock(ii, jj+i*fNvar);
298 }
299
300 det=1;
301 while (det>kEps) {
302 det=CStep(fN, fH, index, fData, sscp, ndist);
303 if(TMath::Abs(det-deti[i])<kEps)
304 break;
305 else
306 deti[i]=det;
307 }
308 for(ii=0; ii<fNvar; ii++) {
309 mstock(i,ii)=fMean(ii);
310 for (jj=0; jj<fNvar; jj++)
311 cstock(ii,jj+i*fNvar)=fCovariance(ii, jj);
312 }
313 }
314
315 Int_t detind=TMath::LocMin(nbest, deti);
316 for(ii=0; ii<fNvar; ii++) {
317 fMean(ii)=mstock(detind,ii);
318
319 for(jj=0; jj<fNvar; jj++)
320 fCovariance(ii, jj)=cstock(ii,jj+detind*fNvar);
321 }
322
323 if (deti[detind]!=0) {
324 //calculate robust distances and throw out the bad points
325 Int_t nout = RDist(sscp);
326 Double_t cutoff=kChiQuant[fNvar-1];
327
328 fOut.Set(nout);
329
330 j=0;
331 for (i=0; i<fN; i++) {
332 if(fRd(i)>cutoff) {
333 fOut[j]=i;
334 j++;
335 }
336 }
337
338 } else {
339 fExact=Exact(ndist);
340 }
341 delete [] index;
342 delete [] ndist;
343 delete [] deti;
344 return;
345
346 }
347 /////////////////////////////////////////////////
348 //if n>nmini, the dataset should be partitioned
349 //partitioning
350 ////////////////////////////////////////////////
351 Int_t indsubdat[5];
352 Int_t nsub;
353 for (ii=0; ii<5; ii++)
354 indsubdat[ii]=0;
355
356 nsub = Partition(nmini, indsubdat);
357
358 Int_t sum=0;
359 for (ii=0; ii<5; ii++)
360 sum+=indsubdat[ii];
361 Int_t *subdat=new Int_t[sum];
362 //printf("allocates subdat[ %d ]\n", sum);
363 // init the subdat matrix
364 for (int iii = 0; iii < sum; ++iii) subdat[iii] = -999;
365 RDraw(subdat, nsub, indsubdat);
366 for (int iii = 0; iii < sum; ++iii) {
367 if (subdat[iii] < 0 || subdat[iii] >= fN ) {
368 Error("Evaluate","subdat index is invalid subdat[%d] = %d",iii, subdat[iii] );
369 R__ASSERT(0);
370 }
371 }
372 //now the indexes of selected cases are in the array subdat
373 //matrices to store best means and covariances
374 Int_t nbestsub=nbest*nsub;
375 TMatrixD mstockbig(nbestsub, fNvar);
376 TMatrixD cstockbig(fNvar, fNvar*nbestsub);
377 TMatrixD hyperplane(nbestsub, fNvar);
378 for (i=0; i<nbestsub; i++) {
379 for(j=0; j<fNvar; j++)
380 hyperplane(i,j)=0;
381 }
382 Double_t *detibig = new Double_t[nbestsub];
383 Int_t maxind;
384 maxind=TMath::LocMax(5, indsubdat);
385 TMatrixD dattemp(indsubdat[maxind], fNvar);
386
387 Int_t k2=Int_t(k1/nsub);
388 //construct h-subsets and perform 2 CSteps in subgroups
389
390 for (Int_t kgroup=0; kgroup<nsub; kgroup++) {
391 //printf("group #%d\n", kgroup);
392 Int_t ntemp=indsubdat[kgroup];
393 Int_t temp=0;
394 for (i=0; i<kgroup; i++)
395 temp+=indsubdat[i];
396 Int_t par;
397
398
399 for(i=0; i<ntemp; i++) {
400 for (j=0; j<fNvar; j++) {
401 dattemp(i,j)=fData[subdat[temp+i]][j];
402 }
403 }
404 Int_t htemp=Int_t(fH*ntemp/fN);
405
406 for (i=0; i<nbest; i++)
407 deti[i]=1e16;
408
409 for(k=0; k<k2; k++) {
410 CreateSubset(ntemp, htemp, fNvar, index, dattemp, sscp, ndist);
411 ClearSscp(sscp);
412 for (i=0; i<htemp; i++) {
413 for(j=0; j<fNvar; j++) {
414 vec(j)=dattemp(index[i],j);
415 }
416 AddToSscp(sscp, vec);
417 }
418 Covar(sscp, fMean, fCovariance, fSd, htemp);
419 det = fCovariance.Determinant();
420 if (det<kEps) {
421 par =Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
422 if(par==nbest+1) {
423
424 delete [] detibig;
425 delete [] deti;
426 delete [] subdat;
427 delete [] ndist;
428 delete [] index;
429 return;
430 } else
431 deti[par]=det;
432 } else {
433 det = CStep(ntemp, htemp, index, dattemp, sscp, ndist);
434 if (det<kEps) {
435 par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp, ndist);
436 if(par==nbest+1) {
437
438 delete [] detibig;
439 delete [] deti;
440 delete [] subdat;
441 delete [] ndist;
442 delete [] index;
443 return;
444 } else
445 deti[par]=det;
446 } else {
447 det=CStep(ntemp,htemp, index, dattemp, sscp, ndist);
448 if(det<kEps){
449 par=Exact2(mstockbig, cstockbig, hyperplane, deti, nbest, kgroup, sscp,ndist);
450 if(par==nbest+1) {
451
452 delete [] detibig;
453 delete [] deti;
454 delete [] subdat;
455 delete [] ndist;
456 delete [] index;
457 return;
458 } else {
459 deti[par]=det;
460 }
461 } else {
462 maxind=TMath::LocMax(nbest, deti);
463 if(det<deti[maxind]) {
464 deti[maxind]=det;
465 for(i=0; i<fNvar; i++) {
466 mstockbig(nbest*kgroup+maxind,i)=fMean(i);
467 for(j=0; j<fNvar; j++) {
468 cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
469
470 }
471 }
472 }
473
474 }
475 }
476 }
477
478 maxind=TMath::LocMax(nbest, deti);
479 if (deti[maxind]<kEps)
480 break;
481 }
482
483
484 for(i=0; i<nbest; i++) {
485 detibig[kgroup*nbest + i]=deti[i];
486
487 }
488
489 }
490
491 //now the arrays mstockbig and cstockbig store nbest*nsub best means and covariances
492 //detibig stores nbest*nsub their determinants
493 //merge the subsets and carry out 2 CSteps on the merged set for all 50 best solutions
494
495 TMatrixD datmerged(sum, fNvar);
496 for(i=0; i<sum; i++) {
497 for (j=0; j<fNvar; j++)
498 datmerged(i,j)=fData[subdat[i]][j];
499 }
500 // printf("performing calculations for merged set\n");
501 Int_t hmerged=Int_t(sum*fH/fN);
502
503 Int_t nh;
504 for(k=0; k<nbestsub; k++) {
505 //for all best solutions perform 2 CSteps and then choose the very best
506 for(ii=0; ii<fNvar; ii++) {
507 fMean(ii)=mstockbig(k,ii);
508 for(jj=0; jj<fNvar; jj++)
509 fCovariance(ii, jj)=cstockbig(ii,k*fNvar+jj);
510 }
511 if(detibig[k]==0) {
512 for(i=0; i<fNvar; i++)
513 fHyperplane(i)=hyperplane(k,i);
514 CreateOrtSubset(datmerged,index, hmerged, sum, sscp, ndist);
515
516 }
517 det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
518 if (det<kEps) {
519 nh= Exact(ndist);
520 if (nh>=fH) {
521 fExact = nh;
522
523 delete [] detibig;
524 delete [] deti;
525 delete [] subdat;
526 delete [] ndist;
527 delete [] index;
528 return;
529 } else {
530 CreateOrtSubset(datmerged, index, hmerged, sum, sscp, ndist);
531 }
532 }
533
534 det=CStep(sum, hmerged, index, datmerged, sscp, ndist);
535 if (det<kEps) {
536 nh=Exact(ndist);
537 if (nh>=fH) {
538 fExact = nh;
539 delete [] detibig;
540 delete [] deti;
541 delete [] subdat;
542 delete [] ndist;
543 delete [] index;
544 return;
545 }
546 }
547 detibig[k]=det;
548 for(i=0; i<fNvar; i++) {
549 mstockbig(k,i)=fMean(i);
550 for(j=0; j<fNvar; j++) {
551 cstockbig(i,k*fNvar+j)=fCovariance(i, j);
552 }
553 }
554 }
555 //now for the subset with the smallest determinant
556 //repeat CSteps until convergence
557 Int_t minind=TMath::LocMin(nbestsub, detibig);
558 det=detibig[minind];
559 for(i=0; i<fNvar; i++) {
560 fMean(i)=mstockbig(minind,i);
561 fHyperplane(i)=hyperplane(minind,i);
562 for(j=0; j<fNvar; j++)
563 fCovariance(i, j)=cstockbig(i,minind*fNvar + j);
564 }
565 if(det<kEps)
566 CreateOrtSubset(fData, index, fH, fN, sscp, ndist);
567 det=1;
568 while (det>kEps) {
569 det=CStep(fN, fH, index, fData, sscp, ndist);
570 if(TMath::Abs(det-detibig[minind])<kEps) {
571 break;
572 } else {
573 detibig[minind]=det;
574 }
575 }
576 if(det<kEps) {
577 Exact(ndist);
579 }
580 Int_t nout = RDist(sscp);
581 Double_t cutoff=kChiQuant[fNvar-1];
582
583 fOut.Set(nout);
584
585 j=0;
586 for (i=0; i<fN; i++) {
587 if(fRd(i)>cutoff) {
588 fOut[j]=i;
589 j++;
590 }
591 }
592
593 delete [] detibig;
594 delete [] deti;
595 delete [] subdat;
596 delete [] ndist;
597 delete [] index;
598 return;
599}
600
601////////////////////////////////////////////////////////////////////////////////
602///for the univariate case
603///estimates of location and scatter are returned in mean and sigma parameters
604///the algorithm works on the same principle as in multivariate case -
605///it finds a subset of size hh with smallest sigma, and then returns mean and
606///sigma of this subset
607
609{
610 if (hh==0)
611 hh=(nvectors+2)/2;
612 Double_t faclts[]={2.6477,2.5092,2.3826,2.2662,2.1587,2.0589,1.9660,1.879,1.7973,1.7203,1.6473};
613 Int_t *index=new Int_t[nvectors];
614 TMath::Sort(nvectors, data, index, kFALSE);
615
616 Int_t nquant;
617 nquant=TMath::Min(Int_t(Double_t(((hh*1./nvectors)-0.5)*40))+1, 11);
618 Double_t factor=faclts[nquant-1];
619
620 Double_t *aw=new Double_t[nvectors];
621 Double_t *aw2=new Double_t[nvectors];
622 Double_t sq=0;
623 Double_t sqmin=0;
624 Int_t ndup=0;
625 Int_t len=nvectors-hh;
626 Double_t *slutn=new Double_t[len];
627 for(Int_t i=0; i<len; i++)
628 slutn[i]=0;
629 for(Int_t jint=0; jint<len; jint++) {
630 aw[jint]=0;
631 for (Int_t j=0; j<hh; j++) {
632 aw[jint]+=data[index[j+jint]];
633 if(jint==0)
634 sq+=data[index[j]]*data[index[j]];
635 }
636 aw2[jint]=aw[jint]*aw[jint]/hh;
637
638 if(jint==0) {
639 sq=sq-aw2[jint];
640 sqmin=sq;
641 slutn[ndup]=aw[jint];
642
643 } else {
644 sq=sq - data[index[jint-1]]*data[index[jint-1]]+
645 data[index[jint+hh]]*data[index[jint+hh]]-
646 aw2[jint]+aw2[jint-1];
647 if(sq<sqmin) {
648 ndup=0;
649 sqmin=sq;
650 slutn[ndup]=aw[jint];
651
652 } else {
653 if(sq==sqmin) {
654 ndup++;
655 slutn[ndup]=aw[jint];
656 }
657 }
658 }
659 }
660
661 slutn[0]=slutn[Int_t((ndup)/2)]/hh;
662 Double_t bstd=factor*TMath::Sqrt(sqmin/hh);
663 mean=slutn[0];
664 sigma=bstd;
665 delete [] aw;
666 delete [] aw2;
667 delete [] slutn;
668 delete [] index;
669}
670
671////////////////////////////////////////////////////////////////////////////////
672///returns the breakdown point of the algorithm
673
675{
676 Int_t n;
677 n=(fN-fH+1)/fN;
678 return n;
679}
680
681////////////////////////////////////////////////////////////////////////////////
682///returns the chi2 quantiles
683
685{
686 if (i < 0 || i >= 50) return 0;
687 return kChiQuant[i];
688}
689
690////////////////////////////////////////////////////////////////////////////////
691///returns the covariance matrix
692
694{
695 if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar){
696 Warning("GetCovariance","provided matrix is of the wrong size, it will be resized");
697 matr.ResizeTo(fNvar, fNvar);
698 }
699 matr=fCovariance;
700}
701
702////////////////////////////////////////////////////////////////////////////////
703///returns the correlation matrix
704
706{
707 if (matr.GetNrows()!=fNvar || matr.GetNcols()!=fNvar) {
708 Warning("GetCorrelation","provided matrix is of the wrong size, it will be resized");
709 matr.ResizeTo(fNvar, fNvar);
710 }
711 matr=fCorrelation;
712}
713
714////////////////////////////////////////////////////////////////////////////////
715///if the points are on a hyperplane, returns this hyperplane
716
718{
719 if (fExact==0) {
720 Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
721 return nullptr;
722 } else {
723 return &fHyperplane;
724 }
725}
726
727////////////////////////////////////////////////////////////////////////////////
728///if the points are on a hyperplane, returns this hyperplane
729
731{
732 if (fExact==0){
733 Error("GetHyperplane","the data doesn't lie on a hyperplane!\n");
734 return;
735 }
736 if (vec.GetNoElements()!=fNvar) {
737 Warning("GetHyperPlane","provided vector is of the wrong size, it will be resized");
738 vec.ResizeTo(fNvar);
739 }
741}
742
743////////////////////////////////////////////////////////////////////////////////
744///return the estimate of the mean
745
747{
748 if (means.GetNoElements()!=fNvar) {
749 Warning("GetMean","provided vector is of the wrong size, it will be resized");
750 means.ResizeTo(fNvar);
751 }
752 means=fMean;
753}
754
755////////////////////////////////////////////////////////////////////////////////
756///returns the robust distances (helps to find outliers)
757
759{
760 if (rdist.GetNoElements()!=fN) {
761 Warning("GetRDistances","provided vector is of the wrong size, it will be resized");
762 rdist.ResizeTo(fN);
763 }
764 rdist=fRd;
765}
766
767////////////////////////////////////////////////////////////////////////////////
768///returns the number of outliers
769
771{
772 return fOut.GetSize();
773}
774
775////////////////////////////////////////////////////////////////////////////////
776///update the sscp matrix with vector vec
777
779{
780 Int_t i, j;
781 for (j=1; j<fNvar+1; j++) {
782 sscp(0, j) +=vec(j-1);
783 sscp(j, 0) = sscp(0, j);
784 }
785 for (i=1; i<fNvar+1; i++) {
786 for (j=1; j<fNvar+1; j++) {
787 sscp(i, j) += vec(i-1)*vec(j-1);
788 }
789 }
790}
791
792////////////////////////////////////////////////////////////////////////////////
793///clear the sscp matrix, used for covariance and mean calculation
794
796{
797 for (Int_t i=0; i<fNvar+1; i++) {
798 for (Int_t j=0; j<fNvar+1; j++) {
799 sscp(i, j)=0;
800 }
801 }
802}
803
804////////////////////////////////////////////////////////////////////////////////
805///called when h=n. Returns classic covariance matrix
806///and mean
807
809{
810 TMatrixD sscp(fNvar+1, fNvar+1);
811 TVectorD temp(fNvar);
812 ClearSscp(sscp);
813 for (Int_t i=0; i<fN; i++) {
814 for (Int_t j=0; j<fNvar; j++)
815 temp(j)=fData(i, j);
816 AddToSscp(sscp, temp);
817 }
818 Covar(sscp, fMean, fCovariance, fSd, fN);
819 Correl();
820
821}
822
823////////////////////////////////////////////////////////////////////////////////
824///calculates mean and covariance
825
827{
828 Int_t i, j;
829 Double_t f;
830 for (i=0; i<fNvar; i++) {
831 m(i)=sscp(0, i+1);
832 sd[i]=sscp(i+1, i+1);
833 f=(sd[i]-m(i)*m(i)/nvec)/(nvec-1);
834 if (f>1e-14) sd[i]=TMath::Sqrt(f);
835 else sd[i]=0;
836 m(i)/=nvec;
837 }
838 for (i=0; i<fNvar; i++) {
839 for (j=0; j<fNvar; j++) {
840 cov(i, j)=sscp(i+1, j+1)-nvec*m(i)*m(j);
841 cov(i, j)/=nvec-1;
842 }
843 }
844}
845
846////////////////////////////////////////////////////////////////////////////////
847///transforms covariance matrix into correlation matrix
848
850{
851 Int_t i, j;
852 Double_t *sd=new Double_t[fNvar];
853 for(j=0; j<fNvar; j++)
854 sd[j]=1./TMath::Sqrt(fCovariance(j, j));
855 for(i=0; i<fNvar; i++) {
856 for (j=0; j<fNvar; j++) {
857 if (i==j)
858 fCorrelation(i, j)=1.;
859 else
860 fCorrelation(i, j)=fCovariance(i, j)*sd[i]*sd[j];
861 }
862 }
863 delete [] sd;
864}
865
866////////////////////////////////////////////////////////////////////////////////
867///creates a subset of htotal elements from ntotal elements
868///first, p+1 elements are drawn randomly(without repetitions)
869///if their covariance matrix is singular, more elements are
870///added one by one, until their covariance matrix becomes regular
871///or it becomes clear that htotal observations lie on a hyperplane
872///If covariance matrix determinant!=0, distances of all ntotal elements
873///are calculated, using formula d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where
874///M is mean and S_inv is the inverse of the covariance matrix
875///htotal points with smallest distances are included in the returned subset.
876
878{
879 Double_t kEps = 1e-14;
880 Int_t i, j;
881 Bool_t repeat=kFALSE;
882 Int_t nindex=0;
883 Int_t num;
884 for(i=0; i<ntotal; i++)
885 index[i]=ntotal+1;
886
887 for (i=0; i<p+1; i++) {
888 num=Int_t(gRandom->Uniform(0, 1)*(ntotal-1));
889 if (i>0){
890 for(j=0; j<=i-1; j++) {
891 if(index[j]==num)
892 repeat=kTRUE;
893 }
894 }
895 if(repeat==kTRUE) {
896 i--;
897 repeat=kFALSE;
898 } else {
899 index[i]=num;
900 nindex++;
901 }
902 }
903
904 ClearSscp(sscp);
905
907 Double_t det;
908 for (i=0; i<p+1; i++) {
909 for (j=0; j<fNvar; j++) {
910 vec[j]=data[index[i]][j];
911
912 }
913 AddToSscp(sscp, vec);
914 }
915
916 Covar(sscp, fMean, fCovariance, fSd, p+1);
918 while((det<kEps)&&(nindex < htotal)) {
919 //if covariance matrix is singular,another vector is added until
920 //the matrix becomes regular or it becomes clear that all
921 //vectors of the group lie on a hyperplane
922 repeat=kFALSE;
923 do{
924 num=Int_t(gRandom->Uniform(0,1)*(ntotal-1));
925 repeat=kFALSE;
926 for(i=0; i<nindex; i++) {
927 if(index[i]==num) {
928 repeat=kTRUE;
929 break;
930 }
931 }
932 }while(repeat==kTRUE);
933
934 index[nindex]=num;
935 nindex++;
936 //check if covariance matrix is singular
937 for(j=0; j<fNvar; j++)
938 vec[j]=data[index[nindex-1]][j];
939 AddToSscp(sscp, vec);
940 Covar(sscp, fMean, fCovariance, fSd, nindex);
942 }
943
944 if(nindex!=htotal) {
946 fInvcovariance = chol.Invert();
947
948 TVectorD temp(fNvar);
949 for(j=0; j<ntotal; j++) {
950 ndist[j]=0;
951 for(i=0; i<fNvar; i++)
952 temp[i]=data[j][i] - fMean(i);
953 temp*=fInvcovariance;
954 for(i=0; i<fNvar; i++)
955 ndist[j]+=(data[j][i]-fMean(i))*temp[i];
956 }
957 KOrdStat(ntotal, ndist, htotal-1,index);
958 }
959
960}
961
962////////////////////////////////////////////////////////////////////////////////
963///creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane
964///hyp[1]*(x1-mean[1])+...+hyp[nvar]*(xnvar-mean[nvar])=0
965///This function is called in case when less than fH samples lie on a hyperplane.
966
968{
969 Int_t i, j;
971 for (i=0; i<nmerged; i++) {
972 ndist[i]=0;
973 for(j=0; j<fNvar; j++) {
974 ndist[i]+=fHyperplane[j]*(dat[i][j]-fMean[j]);
975 ndist[i]=TMath::Abs(ndist[i]);
976 }
977 }
978 KOrdStat(nmerged, ndist, hmerged-1, index);
979 ClearSscp(sscp);
980 for (i=0; i<hmerged; i++) {
981 for(j=0; j<fNvar; j++)
982 vec[j]=dat[index[i]][j];
983 AddToSscp(sscp, vec);
984 }
985 Covar(sscp, fMean, fCovariance, fSd, hmerged);
986}
987
988////////////////////////////////////////////////////////////////////////////////
989///from the input htotal-subset constructs another htotal subset with lower determinant
990///
991///As proven by Peter J.Rousseeuw and Katrien Van Driessen, if distances for all elements
992///are calculated, using the formula:d_i=Sqrt((x_i-M)*S_inv*(x_i-M)), where M is the mean
993///of the input htotal-subset, and S_inv - the inverse of its covariance matrix, then
994///htotal elements with smallest distances will have covariance matrix with determinant
995///less or equal to the determinant of the input subset covariance matrix.
996///
997///determinant for this htotal-subset with smallest distances is returned
998
1000{
1001 Int_t i, j;
1003 Double_t det;
1004
1006 fInvcovariance = chol.Invert();
1007
1008 TVectorD temp(fNvar);
1009 for(j=0; j<ntotal; j++) {
1010 ndist[j]=0;
1011 for(i=0; i<fNvar; i++)
1012 temp[i]=data[j][i]-fMean[i];
1013 temp*=fInvcovariance;
1014 for(i=0; i<fNvar; i++)
1015 ndist[j]+=(data[j][i]-fMean[i])*temp[i];
1016 }
1017
1018 //taking h smallest
1019 KOrdStat(ntotal, ndist, htotal-1, index);
1020 //writing their mean and covariance
1021 ClearSscp(sscp);
1022 for (i=0; i<htotal; i++) {
1023 for (j=0; j<fNvar; j++)
1024 temp[j]=data[index[i]][j];
1025 AddToSscp(sscp, temp);
1026 }
1027 Covar(sscp, fMean, fCovariance, fSd, htotal);
1028 det = fCovariance.Determinant();
1029 return det;
1030}
1031
1032////////////////////////////////////////////////////////////////////////////////
1033///for the exact fit situations
1034///returns number of observations on the hyperplane
1035
1037{
1038 Int_t i, j;
1039
1041 TMatrixD eigenMatrix=eigen.GetEigenVectors();
1042
1043 for (j=0; j<fNvar; j++) {
1044 fHyperplane[j]=eigenMatrix(j,fNvar-1);
1045 }
1046 //calculate and return how many observations lie on the hyperplane
1047 for (i=0; i<fN; i++) {
1048 ndist[i]=0;
1049 for(j=0; j<fNvar; j++) {
1050 ndist[i]+=fHyperplane[j]*(fData[i][j]-fMean[j]);
1051 ndist[i]=TMath::Abs(ndist[i]);
1052 }
1053 }
1054 Int_t nhyp=0;
1055
1056 for (i=0; i<fN; i++) {
1057 if(ndist[i] < 1e-14) nhyp++;
1058 }
1059 return nhyp;
1060
1061}
1062
1063////////////////////////////////////////////////////////////////////////////////
1064///This function is called if determinant of the covariance matrix of a subset=0.
1065///
1066///If there are more then fH vectors on a hyperplane,
1067///returns this hyperplane and stops
1068///else stores the hyperplane coordinates in hyperplane matrix
1069
1070Int_t TRobustEstimator::Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane,
1071 Double_t *deti, Int_t nbest, Int_t kgroup,
1072 TMatrixD &sscp, Double_t *ndist)
1073{
1074 Int_t i, j;
1075
1077 Int_t maxind = TMath::LocMax(nbest, deti);
1078 Int_t nh=Exact(ndist);
1079 //now nh is the number of observation on the hyperplane
1080 //ndist stores distances of observation from this hyperplane
1081 if(nh>=fH) {
1082 ClearSscp(sscp);
1083 for (i=0; i<fN; i++) {
1084 if(ndist[i]<1e-14) {
1085 for (j=0; j<fNvar; j++)
1086 vec[j]=fData[i][j];
1087 AddToSscp(sscp, vec);
1088 }
1089 }
1090 Covar(sscp, fMean, fCovariance, fSd, nh);
1091
1092 fExact=nh;
1093 return nbest+1;
1094
1095 } else {
1096 //if less than fH observations lie on a hyperplane,
1097 //mean and covariance matrix are stored in mstockbig
1098 //and cstockbig in place of the previous maximum determinant
1099 //mean and covariance
1100 for(i=0; i<fNvar; i++) {
1101 mstockbig(nbest*kgroup+maxind,i)=fMean(i);
1102 hyperplane(nbest*kgroup+maxind,i)=fHyperplane(i);
1103 for(j=0; j<fNvar; j++) {
1104 cstockbig(i,nbest*kgroup*fNvar+maxind*fNvar+j)=fCovariance(i,j);
1105 }
1106 }
1107 return maxind;
1108 }
1109}
1110
1111
1112////////////////////////////////////////////////////////////////////////////////
1113///divides the elements into approximately equal subgroups
1114///number of elements in each subgroup is stored in indsubdat
1115///number of subgroups is returned
1116
1118{
1119 Int_t nsub;
1120 if ((fN>=2*nmini) && (fN<=(3*nmini-1))) {
1121 if (fN%2==1){
1122 indsubdat[0]=Int_t(fN*0.5);
1123 indsubdat[1]=Int_t(fN*0.5)+1;
1124 } else
1125 indsubdat[0]=indsubdat[1]=Int_t(fN/2);
1126 nsub=2;
1127 }
1128 else{
1129 if((fN>=3*nmini) && (fN<(4*nmini -1))) {
1130 if(fN%3==0){
1131 indsubdat[0]=indsubdat[1]=indsubdat[2]=Int_t(fN/3);
1132 } else {
1133 indsubdat[0]=Int_t(fN/3);
1134 indsubdat[1]=Int_t(fN/3)+1;
1135 if (fN%3==1) indsubdat[2]=Int_t(fN/3);
1136 else indsubdat[2]=Int_t(fN/3)+1;
1137 }
1138 nsub=3;
1139 }
1140 else{
1141 if((fN>=4*nmini)&&(fN<=(5*nmini-1))){
1142 if (fN%4==0) indsubdat[0]=indsubdat[1]=indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1143 else {
1144 indsubdat[0]=Int_t(fN/4);
1145 indsubdat[1]=Int_t(fN/4)+1;
1146 if(fN%4==1) indsubdat[2]=indsubdat[3]=Int_t(fN/4);
1147 if(fN%4==2) {
1148 indsubdat[2]=Int_t(fN/4)+1;
1149 indsubdat[3]=Int_t(fN/4);
1150 }
1151 if(fN%4==3) indsubdat[2]=indsubdat[3]=Int_t(fN/4)+1;
1152 }
1153 nsub=4;
1154 } else {
1155 for(Int_t i=0; i<5; i++)
1156 indsubdat[i]=nmini;
1157 nsub=5;
1158 }
1159 }
1160 }
1161 return nsub;
1162}
1163
1164////////////////////////////////////////////////////////////////////////////////
1165///Calculates robust distances.Then the samples with robust distances
1166///greater than a cutoff value (0.975 quantile of chi2 distribution with
1167///fNvar degrees of freedom, multiplied by a correction factor), are given
1168///weiht=0, and new, reweighted estimates of location and scatter are calculated
1169///The function returns the number of outliers.
1170
1172{
1173 Int_t i, j;
1174 Int_t nout=0;
1175
1176 TVectorD temp(fNvar);
1178 fInvcovariance = chol.Invert();
1179
1180
1181 for (i=0; i<fN; i++) {
1182 fRd[i]=0;
1183 for(j=0; j<fNvar; j++) {
1184 temp[j]=fData[i][j]-fMean[j];
1185 }
1186 temp*=fInvcovariance;
1187 for(j=0; j<fNvar; j++) {
1188 fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1189 }
1190 }
1191
1192 Double_t med;
1193 Double_t chi = kChiMedian[fNvar-1];
1194
1196 med/=chi;
1197 fCovariance*=med;
1198 TDecompChol chol2(fCovariance);
1199 fInvcovariance = chol2.Invert();
1200
1201 for (i=0; i<fN; i++) {
1202 fRd[i]=0;
1203 for(j=0; j<fNvar; j++) {
1204 temp[j]=fData[i][j]-fMean[j];
1205 }
1206
1207 temp*=fInvcovariance;
1208 for(j=0; j<fNvar; j++) {
1209 fRd[i]+=(fData[i][j]-fMean[j])*temp[j];
1210 }
1211 }
1212
1213 Double_t cutoff = kChiQuant[fNvar-1];
1214
1215 ClearSscp(sscp);
1216 for(i=0; i<fN; i++) {
1217 if (fRd[i]<=cutoff) {
1218 for(j=0; j<fNvar; j++)
1219 temp[j]=fData[i][j];
1220 AddToSscp(sscp,temp);
1221 } else {
1222 nout++;
1223 }
1224 }
1225
1226 Covar(sscp, fMean, fCovariance, fSd, fN-nout);
1227 return nout;
1228}
1229
1230////////////////////////////////////////////////////////////////////////////////
1231///Draws ngroup nonoverlapping subdatasets out of a dataset of size n
1232///such that the selected case numbers are uniformly distributed from 1 to n
1233
1234void TRobustEstimator::RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
1235{
1236 Int_t jndex = 0;
1237 Int_t nrand;
1238 Int_t i, k, m, j;
1239 for (k=1; k<=ngroup; k++) {
1240 for (m=1; m<=indsubdat[k-1]; m++) {
1241 nrand = Int_t(gRandom->Uniform(0, 1) * double(fN-jndex))+1;
1242 //printf("nrand = %d - jndex %d\n",nrand,jndex);
1243 jndex++;
1244 if (jndex==1) {
1245 subdat[0]=nrand-1; // in case nrand is equal to fN
1246 } else {
1247 subdat[jndex-1]=nrand+jndex-2;
1248 for (i=1; i<=jndex-1; i++) {
1249 if(subdat[i-1] > nrand+i-2) {
1250 for(j=jndex; j>=i+1; j--) {
1251 subdat[j-1]=subdat[j-2];
1252 }
1253 //printf("subdata[] i = %d - nrand %d\n",i,nrand);
1254 subdat[i-1]=nrand+i-2;
1255 break; //breaking the loop for(i=1...
1256 }
1257 }
1258 }
1259 }
1260 }
1261}
1262
1263////////////////////////////////////////////////////////////////////////////////
1264///because I need an Int_t work array
1265
1267 Bool_t isAllocated = kFALSE;
1268 const Int_t kWorkMax=100;
1269 Int_t i, ir, j, l, mid;
1270 Int_t arr;
1271 Int_t *ind;
1272 Int_t workLocal[kWorkMax];
1273 Int_t temp;
1274
1275
1276 if (work) {
1277 ind = work;
1278 } else {
1279 ind = workLocal;
1280 if (ntotal > kWorkMax) {
1281 isAllocated = kTRUE;
1282 ind = new Int_t[ntotal];
1283 }
1284 }
1285
1286 for (Int_t ii=0; ii<ntotal; ii++) {
1287 ind[ii]=ii;
1288 }
1289 Int_t rk = k;
1290 l=0;
1291 ir = ntotal-1;
1292 for(;;) {
1293 if (ir<=l+1) { //active partition contains 1 or 2 elements
1294 if (ir == l+1 && a[ind[ir]]<a[ind[l]])
1295 {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1296 Double_t tmp = a[ind[rk]];
1297 if (isAllocated)
1298 delete [] ind;
1299 return tmp;
1300 } else {
1301 mid = (l+ir) >> 1; //choose median of left, center and right
1302 {temp = ind[mid]; ind[mid]=ind[l+1]; ind[l+1]=temp;}//elements as partitioning element arr.
1303 if (a[ind[l]]>a[ind[ir]]) //also rearrange so that a[l]<=a[l+1]
1304 {temp = ind[l]; ind[l]=ind[ir]; ind[ir]=temp;}
1305
1306 if (a[ind[l+1]]>a[ind[ir]])
1307 {temp=ind[l+1]; ind[l+1]=ind[ir]; ind[ir]=temp;}
1308
1309 if (a[ind[l]]>a[ind[l+1]])
1310 {temp = ind[l]; ind[l]=ind[l+1]; ind[l+1]=temp;}
1311
1312 i=l+1; //initialize pointers for partitioning
1313 j=ir;
1314 arr = ind[l+1];
1315 for (;;) {
1316 do i++; while (a[ind[i]]<a[arr]);
1317 do j--; while (a[ind[j]]>a[arr]);
1318 if (j<i) break; //pointers crossed, partitioning complete
1319 {temp=ind[i]; ind[i]=ind[j]; ind[j]=temp;}
1320 }
1321 ind[l+1]=ind[j];
1322 ind[j]=arr;
1323 if (j>=rk) ir = j-1; //keep active the partition that
1324 if (j<=rk) l=i; //contains the k_th element
1325 }
1326 }
1327}
#define f(i)
Definition RSha256.hxx:104
#define a(i)
Definition RSha256.hxx:99
#define e(i)
Definition RSha256.hxx:103
int Int_t
Definition RtypesCore.h:45
const Bool_t kFALSE
Definition RtypesCore.h:101
double Double_t
Definition RtypesCore.h:59
const Bool_t kTRUE
Definition RtypesCore.h:100
#define ClassImp(name)
Definition Rtypes.h:377
#define R__ASSERT(e)
Definition TError.h:118
winID h TVirtualViewer3D TVirtualGLPainter p
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t index
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t UChar_t len
R__EXTERN TRandom * gRandom
Definition TRandom.h:62
const Double_t kChiMedian[50]
const Double_t kChiQuant[50]
void Set(Int_t n) override
Set size of this array to n ints.
Definition TArrayI.cxx:105
Int_t GetSize() const
Definition TArray.h:47
Cholesky Decomposition class.
Definition TDecompChol.h:25
Bool_t Invert(TMatrixDSym &inv)
For a symmetric matrix A(m,m), its inverse A_inv(m,m) is returned .
TMatrixDSymEigen.
const TMatrixD & GetEigenVectors() const
Int_t GetNrows() const
Int_t GetNcols() const
TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1) override
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
Double_t Determinant() const override
TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1) override
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
virtual void Warning(const char *method, const char *msgfmt,...) const
Issue warning message.
Definition TObject.cxx:962
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:976
virtual Double_t Uniform(Double_t x1=1)
Returns a uniform deviate on the interval (0, x1).
Definition TRandom.cxx:672
Minimum Covariance Determinant Estimator - a Fast Algorithm invented by Peter J.Rousseeuw and Katrien...
Int_t RDist(TMatrixD &sscp)
Calculates robust distances.Then the samples with robust distances greater than a cutoff value (0....
TMatrixDSym fCovariance
Double_t CStep(Int_t ntotal, Int_t htotal, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
from the input htotal-subset constructs another htotal subset with lower determinant
const TMatrixDSym * GetCovariance() const
TMatrixDSym fInvcovariance
void CreateSubset(Int_t ntotal, Int_t htotal, Int_t p, Int_t *index, TMatrixD &data, TMatrixD &sscp, Double_t *ndist)
creates a subset of htotal elements from ntotal elements first, p+1 elements are drawn randomly(witho...
void Covar(TMatrixD &sscp, TVectorD &m, TMatrixDSym &cov, TVectorD &sd, Int_t nvec)
calculates mean and covariance
Int_t Exact(Double_t *ndist)
for the exact fit situations returns number of observations on the hyperplane
Double_t KOrdStat(Int_t ntotal, Double_t *arr, Int_t k, Int_t *work)
because I need an Int_t work array
Int_t GetNOut()
returns the number of outliers
TRobustEstimator()
this constructor should be used in a univariate case: first call this constructor,...
void Correl()
transforms covariance matrix into correlation matrix
void AddColumn(Double_t *col)
adds a column to the data matrix it is assumed that the column has size fN variable fVarTemp keeps th...
void RDraw(Int_t *subdat, Int_t ngroup, Int_t *indsubdat)
Draws ngroup nonoverlapping subdatasets out of a dataset of size n such that the selected case number...
void Classic()
called when h=n.
void Evaluate()
Finds the estimate of multivariate mean and variance.
const TVectorD * GetMean() const
Int_t Exact2(TMatrixD &mstockbig, TMatrixD &cstockbig, TMatrixD &hyperplane, Double_t *deti, Int_t nbest, Int_t kgroup, TMatrixD &sscp, Double_t *ndist)
This function is called if determinant of the covariance matrix of a subset=0.
const TVectorD * GetRDistances() const
Int_t GetBDPoint()
returns the breakdown point of the algorithm
void CreateOrtSubset(TMatrixD &dat, Int_t *index, Int_t hmerged, Int_t nmerged, TMatrixD &sscp, Double_t *ndist)
creates a subset of hmerged vectors with smallest orthogonal distances to the hyperplane hyp[1]*(x1-m...
void AddRow(Double_t *row)
adds a vector to the data matrix it is supposed that the vector is of size fNvar
TMatrixDSym fCorrelation
const TVectorD * GetHyperplane() const
if the points are on a hyperplane, returns this hyperplane
Int_t Partition(Int_t nmini, Int_t *indsubdat)
divides the elements into approximately equal subgroups number of elements in each subgroup is stored...
const TMatrixDSym * GetCorrelation() const
Double_t GetChiQuant(Int_t i) const
returns the chi2 quantiles
void EvaluateUni(Int_t nvectors, Double_t *data, Double_t &mean, Double_t &sigma, Int_t hh=0)
for the univariate case estimates of location and scatter are returned in mean and sigma parameters t...
void AddToSscp(TMatrixD &sscp, TVectorD &vec)
update the sscp matrix with vector vec
void ClearSscp(TMatrixD &sscp)
clear the sscp matrix, used for covariance and mean calculation
TVectorT< Element > & ResizeTo(Int_t lwb, Int_t upb)
Resize the vector to [lwb:upb] .
Definition TVectorT.cxx:294
Int_t GetNoElements() const
Definition TVectorT.h:76
Element * GetMatrixArray()
Definition TVectorT.h:78
const Double_t sigma
const Int_t n
Definition legend1.C:16
Long64_t LocMin(Long64_t n, const T *a)
Returns index of array with the minimum element.
Definition TMath.h:982
Double_t Median(Long64_t n, const T *a, const Double_t *w=nullptr, Long64_t *work=nullptr)
Same as RMS.
Definition TMath.h:1272
Long64_t LocMax(Long64_t n, const T *a)
Returns index of array with the maximum element.
Definition TMath.h:1012
Double_t Sqrt(Double_t x)
Returns the square root of x.
Definition TMath.h:662
Short_t Min(Short_t a, Short_t b)
Returns the smallest of a and b.
Definition TMathBase.h:198
void Sort(Index n, const Element *a, Index *index, Bool_t down=kTRUE)
Sort the n elements of the array a of generic templated type Element.
Definition TMathBase.h:431
Short_t Abs(Short_t d)
Returns the absolute value of parameter Short_t d.
Definition TMathBase.h:123
TMarker m
Definition textangle.C:8
TLine l
Definition textangle.C:4
static uint64_t sum(uint64_t i)
Definition Factory.cxx:2345