{
"cells": [
{
"cell_type": "markdown",
"id": "8e485cf7",
"metadata": {},
"source": [
"# principal\n",
"Principal Components Analysis (PCA) example.\n",
"\n",
"Example of using TPrincipal as a stand alone class.\n",
"\n",
"We create n-dimensional data points, where c = trunc(n / 5) + 1\n",
"are correlated with the rest n - c randomly distributed variables.\n",
"\n",
"\n",
"\n",
"\n",
"**Author:** Rene Brun, Christian Holm Christensen \n",
"This notebook tutorial was automatically generated with ROOTBOOK-izer from the macro found in the ROOT repository on Tuesday, May 19, 2026 at 08:27 PM."
]
},
{
"cell_type": "markdown",
"id": "d09b6776",
"metadata": {},
"source": [
" Arguments are defined. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "55f73072",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:08.679976Z",
"iopub.status.busy": "2026-05-19T20:27:08.679870Z",
"iopub.status.idle": "2026-05-19T20:27:09.002034Z",
"shell.execute_reply": "2026-05-19T20:27:09.001531Z"
}
},
"outputs": [],
"source": [
"Int_t n=10;\n",
"Int_t m=10000;"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5a77a8bd",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:09.003584Z",
"iopub.status.busy": "2026-05-19T20:27:09.003469Z",
"iopub.status.idle": "2026-05-19T20:27:09.210592Z",
"shell.execute_reply": "2026-05-19T20:27:09.210158Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"*************************************************\n",
"* Principal Component Analysis *\n",
"* *\n",
"* Number of variables: 10 *\n",
"* Number of data points: 10000 *\n",
"* Number of dependent variables: 3 *\n",
"* *\n",
"*************************************************\n"
]
}
],
"source": [
"Int_t c = n / 5 + 1;\n",
"\n",
"cout << \"*************************************************\" << endl;\n",
"cout << \"* Principal Component Analysis *\" << endl;\n",
"cout << \"* *\" << endl;\n",
"cout << \"* Number of variables: \" << setw(4) << n\n",
" << \" *\" << endl;\n",
"cout << \"* Number of data points: \" << setw(8) << m\n",
" << \" *\" << endl;\n",
"cout << \"* Number of dependent variables: \" << setw(4) << c\n",
" << \" *\" << endl;\n",
"cout << \"* *\" << endl;\n",
"cout << \"*************************************************\" << endl;"
]
},
{
"cell_type": "markdown",
"id": "128bea61",
"metadata": {},
"source": [
"Initilase the TPrincipal object. Use the empty string for the\n",
"final argument, if you don't wan't the covariance\n",
"matrix. Normalising the covariance matrix is a good idea if your\n",
"variables have different orders of magnitude."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7c5a24bb",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:09.211874Z",
"iopub.status.busy": "2026-05-19T20:27:09.211764Z",
"iopub.status.idle": "2026-05-19T20:27:09.416312Z",
"shell.execute_reply": "2026-05-19T20:27:09.415816Z"
}
},
"outputs": [],
"source": [
"TPrincipal* principal = new TPrincipal(n,\"ND\");"
]
},
{
"cell_type": "markdown",
"id": "6c36b13e",
"metadata": {},
"source": [
"Use a pseudo-random number generator"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ca5849c4",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:09.417854Z",
"iopub.status.busy": "2026-05-19T20:27:09.417741Z",
"iopub.status.idle": "2026-05-19T20:27:09.624396Z",
"shell.execute_reply": "2026-05-19T20:27:09.623912Z"
}
},
"outputs": [],
"source": [
"TRandom* randomNum = new TRandom;"
]
},
{
"cell_type": "markdown",
"id": "c2de4daa",
"metadata": {},
"source": [
"Make the m data-points\n",
"Make a variable to hold our data\n",
"Allocate memory for the data point"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "7afe0c48",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:09.625906Z",
"iopub.status.busy": "2026-05-19T20:27:09.625794Z",
"iopub.status.idle": "2026-05-19T20:27:09.828066Z",
"shell.execute_reply": "2026-05-19T20:27:09.827677Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"input_line_56:2:2: warning: 'data' shadows a declaration with the same name in the 'std' namespace; use '::data' to reference this declaration\n",
" Double_t* data = new Double_t[n];\n",
" ^\n"
]
}
],
"source": [
"Double_t* data = new Double_t[n];\n",
"for (Int_t i = 0; i < m; i++) {\n",
"\n",
" // First we create the un-correlated, random variables, according\n",
" // to one of three distributions\n",
" for (Int_t j = 0; j < n - c; j++) {\n",
" if (j % 3 == 0) data[j] = randomNum->Gaus(5,1);\n",
" else if (j % 3 == 1) data[j] = randomNum->Poisson(8);\n",
" else data[j] = randomNum->Exp(2);\n",
" }\n",
"\n",
" // Then we create the correlated variables\n",
" for (Int_t j = 0 ; j < c; j++) {\n",
" data[n - c + j] = 0;\n",
" for (Int_t k = 0; k < n - c - j; k++) data[n - c + j] += data[k];\n",
" }\n",
"\n",
" // Finally we're ready to add this datapoint to the PCA\n",
" principal->AddRow(data);\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "a6f74eb7",
"metadata": {},
"source": [
"We delete the data after use, since TPrincipal got it by now."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "afaaedfd",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:09.829361Z",
"iopub.status.busy": "2026-05-19T20:27:09.829246Z",
"iopub.status.idle": "2026-05-19T20:27:10.037035Z",
"shell.execute_reply": "2026-05-19T20:27:10.036470Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"input_line_57:2:12: error: reference to 'data' is ambiguous\n",
" delete [] data;\n",
" ^\n",
"input_line_56:2:12: note: candidate found by name lookup is 'data'\n",
" Double_t* data = new Double_t[n];\n",
" ^\n",
"/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:344:5: note: candidate found by name lookup is 'std::data'\n",
" data(initializer_list<_Tp> __il) noexcept\n",
" ^\n",
"/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:312:5: note: candidate found by name lookup is 'std::data'\n",
" data(_Container& __cont) noexcept(noexcept(__cont.data()))\n",
" ^\n",
"/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:323:5: note: candidate found by name lookup is 'std::data'\n",
" data(const _Container& __cont) noexcept(noexcept(__cont.data()))\n",
" ^\n",
"/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/range_access.h:334:5: note: candidate found by name lookup is 'std::data'\n",
" data(_Tp (&__array)[_Nm]) noexcept\n",
" ^\n"
]
}
],
"source": [
"delete [] data;"
]
},
{
"cell_type": "markdown",
"id": "ac0874df",
"metadata": {},
"source": [
"Do the actual analysis"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "92d2f716",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:10.038279Z",
"iopub.status.busy": "2026-05-19T20:27:10.038164Z",
"iopub.status.idle": "2026-05-19T20:27:10.244960Z",
"shell.execute_reply": "2026-05-19T20:27:10.244478Z"
}
},
"outputs": [],
"source": [
"principal->MakePrincipals();"
]
},
{
"cell_type": "markdown",
"id": "78ad59df",
"metadata": {},
"source": [
"Print out the result on"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "8790819b",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:10.246651Z",
"iopub.status.busy": "2026-05-19T20:27:10.246512Z",
"iopub.status.idle": "2026-05-19T20:27:10.453887Z",
"shell.execute_reply": "2026-05-19T20:27:10.453360Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Variable # | Mean Value | Sigma | Eigenvalue\n",
"-------------+------------+------------+------------\n",
" 0 | 5.008 | 1.005 | 0.3851 \n",
" 1 | 7.998 | 2.861 | 0.1107 \n",
" 2 | 1.967 | 1.956 | 0.1036 \n",
" 3 | 5.016 | 1.005 | 0.1015 \n",
" 4 | 8.009 | 2.839 | 0.1008 \n",
" 5 | 2.013 | 1.973 | 0.09962 \n",
" 6 | 4.992 | 1.014 | 0.09864 \n",
" 7 | 35 | 5.156 | 6.03e-16 \n",
" 8 | 30.01 | 5.049 | 2.787e-16 \n",
" 9 | 28 | 4.649 | 5.093e-16 \n",
"\n"
]
}
],
"source": [
"principal->Print();"
]
},
{
"cell_type": "markdown",
"id": "6caabe92",
"metadata": {},
"source": [
"Test the PCA"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "ab45e6f4",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:10.455129Z",
"iopub.status.busy": "2026-05-19T20:27:10.455011Z",
"iopub.status.idle": "2026-05-19T20:27:10.665554Z",
"shell.execute_reply": "2026-05-19T20:27:10.665053Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Info in : created default TCanvas with name c1\n"
]
}
],
"source": [
"principal->Test();"
]
},
{
"cell_type": "markdown",
"id": "752c7ed2",
"metadata": {},
"source": [
"Make some histograms of the original, principal, residue, etc data"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "c6011644",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:10.666864Z",
"iopub.status.busy": "2026-05-19T20:27:10.666748Z",
"iopub.status.idle": "2026-05-19T20:27:10.877878Z",
"shell.execute_reply": "2026-05-19T20:27:10.877360Z"
}
},
"outputs": [],
"source": [
"principal->MakeHistograms();"
]
},
{
"cell_type": "markdown",
"id": "0e6e7f0b",
"metadata": {},
"source": [
"Make two functions to map between feature and pattern space"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "dbabf724",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:10.879367Z",
"iopub.status.busy": "2026-05-19T20:27:10.879254Z",
"iopub.status.idle": "2026-05-19T20:27:11.086141Z",
"shell.execute_reply": "2026-05-19T20:27:11.085675Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Writing on file \"pca.C\" ... done\n"
]
}
],
"source": [
"principal->MakeCode();"
]
},
{
"cell_type": "markdown",
"id": "cd83a31b",
"metadata": {},
"source": [
"Start a browser, so that we may browse the histograms generated\n",
"above"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4c3d2588",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:11.087417Z",
"iopub.status.busy": "2026-05-19T20:27:11.087298Z",
"iopub.status.idle": "2026-05-19T20:27:11.294163Z",
"shell.execute_reply": "2026-05-19T20:27:11.293766Z"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Warning in : The ROOT browser cannot run in batch mode\n"
]
}
],
"source": [
"TBrowser* b = new TBrowser(\"principalBrowser\", principal);"
]
},
{
"cell_type": "markdown",
"id": "e2935b3b",
"metadata": {},
"source": [
"Draw all canvases "
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "a5fbf2c5",
"metadata": {
"collapsed": false,
"execution": {
"iopub.execute_input": "2026-05-19T20:27:11.295352Z",
"iopub.status.busy": "2026-05-19T20:27:11.295244Z",
"iopub.status.idle": "2026-05-19T20:27:11.501759Z",
"shell.execute_reply": "2026-05-19T20:27:11.501258Z"
}
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"\n",
"
\n",
"\n",
"\n",
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"gROOT->GetListOfCanvases()->Draw()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "ROOT C++",
"language": "c++",
"name": "root"
},
"language_info": {
"codemirror_mode": "text/x-c++src",
"file_extension": ".C",
"mimetype": " text/x-c++src",
"name": "c++"
}
},
"nbformat": 4,
"nbformat_minor": 5
}