{ "cells": [ { "cell_type": "markdown", "id": "8e485cf7", "metadata": {}, "source": [ "# principal\n", "Principal Components Analysis (PCA) example.\n", "\n", "Example of using TPrincipal as a stand alone class.\n", "\n", "We create n-dimensional data points, where c = trunc(n / 5) + 1\n", "are correlated with the rest n - c randomly distributed variables.\n", "\n", "\n", "\n", "\n", "**Author:** Rene Brun, Christian Holm Christensen \n", "This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, May 19, 2026 at 08:27 PM." ] }, { "cell_type": "markdown", "id": "d09b6776", "metadata": {}, "source": [ " Arguments are defined. " ] }, { "cell_type": "code", "execution_count": 1, "id": "55f73072", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:08.679976Z", "iopub.status.busy": "2026-05-19T20:27:08.679870Z", "iopub.status.idle": "2026-05-19T20:27:09.002034Z", "shell.execute_reply": "2026-05-19T20:27:09.001531Z" } }, "outputs": [], "source": [ "Int_t n=10;\n", "Int_t m=10000;" ] }, { "cell_type": "code", "execution_count": 2, "id": "5a77a8bd", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:09.003584Z", "iopub.status.busy": "2026-05-19T20:27:09.003469Z", "iopub.status.idle": "2026-05-19T20:27:09.210592Z", "shell.execute_reply": "2026-05-19T20:27:09.210158Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "*************************************************\n", "* Principal Component Analysis *\n", "* *\n", "* Number of variables: 10 *\n", "* Number of data points: 10000 *\n", "* Number of dependent variables: 3 *\n", "* *\n", "*************************************************\n" ] } ], "source": [ "Int_t c = n / 5 + 1;\n", "\n", "cout << \"*************************************************\" << endl;\n", "cout << \"* Principal Component Analysis *\" << endl;\n", "cout << \"* *\" << endl;\n", "cout << \"* Number of variables: \" << setw(4) << n\n", " << \" *\" << endl;\n", "cout << \"* Number of data points: \" << setw(8) << m\n", " << \" *\" << endl;\n", "cout << \"* Number of dependent variables: \" << setw(4) << c\n", " << \" *\" << endl;\n", "cout << \"* *\" << endl;\n", "cout << \"*************************************************\" << endl;" ] }, { "cell_type": "markdown", "id": "128bea61", "metadata": {}, "source": [ "Initilase the TPrincipal object. Use the empty string for the\n", "final argument, if you don't wan't the covariance\n", "matrix. Normalising the covariance matrix is a good idea if your\n", "variables have different orders of magnitude." ] }, { "cell_type": "code", "execution_count": 3, "id": "7c5a24bb", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:09.211874Z", "iopub.status.busy": "2026-05-19T20:27:09.211764Z", "iopub.status.idle": "2026-05-19T20:27:09.416312Z", "shell.execute_reply": "2026-05-19T20:27:09.415816Z" } }, "outputs": [], "source": [ "TPrincipal* principal = new TPrincipal(n,\"ND\");" ] }, { "cell_type": "markdown", "id": "6c36b13e", "metadata": {}, "source": [ "Use a pseudo-random number generator" ] }, { "cell_type": "code", "execution_count": 4, "id": "ca5849c4", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:09.417854Z", "iopub.status.busy": "2026-05-19T20:27:09.417741Z", "iopub.status.idle": "2026-05-19T20:27:09.624396Z", "shell.execute_reply": "2026-05-19T20:27:09.623912Z" } }, "outputs": [], "source": [ "TRandom* randomNum = new TRandom;" ] }, { "cell_type": "markdown", "id": "c2de4daa", "metadata": {}, "source": [ "Make the m data-points\n", "Make a variable to hold our data\n", "Allocate memory for the data point" ] }, { "cell_type": "code", "execution_count": 5, "id": "7afe0c48", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:09.625906Z", "iopub.status.busy": "2026-05-19T20:27:09.625794Z", "iopub.status.idle": "2026-05-19T20:27:09.828066Z", "shell.execute_reply": "2026-05-19T20:27:09.827677Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_56:2:2: warning: 'data' shadows a declaration with the same name in the 'std' namespace; use '::data' to reference this declaration\n", " Double_t* data = new Double_t[n];\n", " ^\n" ] } ], "source": [ "Double_t* data = new Double_t[n];\n", "for (Int_t i = 0; i < m; i++) {\n", "\n", " // First we create the un-correlated, random variables, according\n", " // to one of three distributions\n", " for (Int_t j = 0; j < n - c; j++) {\n", " if (j % 3 == 0) data[j] = randomNum->Gaus(5,1);\n", " else if (j % 3 == 1) data[j] = randomNum->Poisson(8);\n", " else data[j] = randomNum->Exp(2);\n", " }\n", "\n", " // Then we create the correlated variables\n", " for (Int_t j = 0 ; j < c; j++) {\n", " data[n - c + j] = 0;\n", " for (Int_t k = 0; k < n - c - j; k++) data[n - c + j] += data[k];\n", " }\n", "\n", " // Finally we're ready to add this datapoint to the PCA\n", " principal->AddRow(data);\n", "}" ] }, { "cell_type": "markdown", "id": "a6f74eb7", "metadata": {}, "source": [ "We delete the data after use, since TPrincipal got it by now." ] }, { "cell_type": "code", "execution_count": 6, "id": "afaaedfd", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:09.829361Z", "iopub.status.busy": "2026-05-19T20:27:09.829246Z", "iopub.status.idle": "2026-05-19T20:27:10.037035Z", "shell.execute_reply": "2026-05-19T20:27:10.036470Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "input_line_57:2:12: error: reference to 'data' is ambiguous\n", " delete [] data;\n", " ^\n", "input_line_56:2:12: note: candidate found by name lookup is 'data'\n", " Double_t* data = new Double_t[n];\n", " ^\n", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:344:5: note: candidate found by name lookup is 'std::data'\n", " data(initializer_list<_Tp> __il) noexcept\n", " ^\n", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:312:5: note: candidate found by name lookup is 'std::data'\n", " data(_Container& __cont) noexcept(noexcept(__cont.data()))\n", " ^\n", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:323:5: note: candidate found by name lookup is 'std::data'\n", " data(const _Container& __cont) noexcept(noexcept(__cont.data()))\n", " ^\n", "/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:334:5: note: candidate found by name lookup is 'std::data'\n", " data(_Tp (&__array)[_Nm]) noexcept\n", " ^\n" ] } ], "source": [ "delete [] data;" ] }, { "cell_type": "markdown", "id": "ac0874df", "metadata": {}, "source": [ "Do the actual analysis" ] }, { "cell_type": "code", "execution_count": 7, "id": "92d2f716", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:10.038279Z", "iopub.status.busy": "2026-05-19T20:27:10.038164Z", "iopub.status.idle": "2026-05-19T20:27:10.244960Z", "shell.execute_reply": "2026-05-19T20:27:10.244478Z" } }, "outputs": [], "source": [ "principal->MakePrincipals();" ] }, { "cell_type": "markdown", "id": "78ad59df", "metadata": {}, "source": [ "Print out the result on" ] }, { "cell_type": "code", "execution_count": 8, "id": "8790819b", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:10.246651Z", "iopub.status.busy": "2026-05-19T20:27:10.246512Z", "iopub.status.idle": "2026-05-19T20:27:10.453887Z", "shell.execute_reply": "2026-05-19T20:27:10.453360Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Variable # | Mean Value | Sigma | Eigenvalue\n", "-------------+------------+------------+------------\n", " 0 | 5.008 | 1.005 | 0.3851 \n", " 1 | 7.998 | 2.861 | 0.1107 \n", " 2 | 1.967 | 1.956 | 0.1036 \n", " 3 | 5.016 | 1.005 | 0.1015 \n", " 4 | 8.009 | 2.839 | 0.1008 \n", " 5 | 2.013 | 1.973 | 0.09962 \n", " 6 | 4.992 | 1.014 | 0.09864 \n", " 7 | 35 | 5.156 | 6.03e-16 \n", " 8 | 30.01 | 5.049 | 2.787e-16 \n", " 9 | 28 | 4.649 | 5.093e-16 \n", "\n" ] } ], "source": [ "principal->Print();" ] }, { "cell_type": "markdown", "id": "6caabe92", "metadata": {}, "source": [ "Test the PCA" ] }, { "cell_type": "code", "execution_count": 9, "id": "ab45e6f4", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:10.455129Z", "iopub.status.busy": "2026-05-19T20:27:10.455011Z", "iopub.status.idle": "2026-05-19T20:27:10.665554Z", "shell.execute_reply": "2026-05-19T20:27:10.665053Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Info in : created default TCanvas with name c1\n" ] } ], "source": [ "principal->Test();" ] }, { "cell_type": "markdown", "id": "752c7ed2", "metadata": {}, "source": [ "Make some histograms of the original, principal, residue, etc data" ] }, { "cell_type": "code", "execution_count": 10, "id": "c6011644", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:10.666864Z", "iopub.status.busy": "2026-05-19T20:27:10.666748Z", "iopub.status.idle": "2026-05-19T20:27:10.877878Z", "shell.execute_reply": "2026-05-19T20:27:10.877360Z" } }, "outputs": [], "source": [ "principal->MakeHistograms();" ] }, { "cell_type": "markdown", "id": "0e6e7f0b", "metadata": {}, "source": [ "Make two functions to map between feature and pattern space" ] }, { "cell_type": "code", "execution_count": 11, "id": "dbabf724", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:10.879367Z", "iopub.status.busy": "2026-05-19T20:27:10.879254Z", "iopub.status.idle": "2026-05-19T20:27:11.086141Z", "shell.execute_reply": "2026-05-19T20:27:11.085675Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing on file \"pca.C\" ... done\n" ] } ], "source": [ "principal->MakeCode();" ] }, { "cell_type": "markdown", "id": "cd83a31b", "metadata": {}, "source": [ "Start a browser, so that we may browse the histograms generated\n", "above" ] }, { "cell_type": "code", "execution_count": 12, "id": "4c3d2588", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:11.087417Z", "iopub.status.busy": "2026-05-19T20:27:11.087298Z", "iopub.status.idle": "2026-05-19T20:27:11.294163Z", "shell.execute_reply": "2026-05-19T20:27:11.293766Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning in : The ROOT browser cannot run in batch mode\n" ] } ], "source": [ "TBrowser* b = new TBrowser(\"principalBrowser\", principal);" ] }, { "cell_type": "markdown", "id": "e2935b3b", "metadata": {}, "source": [ "Draw all canvases " ] }, { "cell_type": "code", "execution_count": 13, "id": "a5fbf2c5", "metadata": { "collapsed": false, "execution": { "iopub.execute_input": "2026-05-19T20:27:11.295352Z", "iopub.status.busy": "2026-05-19T20:27:11.295244Z", "iopub.status.idle": "2026-05-19T20:27:11.501759Z", "shell.execute_reply": "2026-05-19T20:27:11.501258Z" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "
\n", "
\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gROOT->GetListOfCanvases()->Draw()" ] } ], "metadata": { "kernelspec": { "display_name": "ROOT C++", "language": "c++", "name": "root" }, "language_info": { "codemirror_mode": "text/x-c++src", "file_extension": ".C", "mimetype": " text/x-c++src", "name": "c++" } }, "nbformat": 4, "nbformat_minor": 5 }