# class TKDTree<int,double>: public TObject

```
kd-tree and its implementation in TKDTree

Contents:
1. What is kd-tree
2. How to cosntruct kdtree - Pseudo code
3. Using TKDTree
a. Creating the kd-tree and setting the data
b. Navigating the kd-tree
4. TKDTree implementation - technical details
a. The order of nodes in internal arrays
b. Division algorithm
c. The order of nodes in boundary related arrays

1. What is kdtree ? ( http://en.wikipedia.org/wiki/Kd-tree )

In computer science, a kd-tree (short for k-dimensional tree) is a space-partitioning data structure
for organizing points in a k-dimensional space. kd-trees are a useful data structure for several
applications, such as searches involving a multidimensional search key (e.g. range searches and
nearest neighbour searches). kd-trees are a special case of BSP trees.

A kd-tree uses only splitting planes that are perpendicular to one of the coordinate system axes.
This differs from BSP trees, in which arbitrary splitting planes can be used.
In addition, in the typical definition every node of a kd-tree, from the root to the leaves, stores a point.
This differs from BSP trees, in which leaves are typically the only nodes that contain points
(or other geometric primitives). As a consequence, each splitting plane must go through one of
the points in the kd-tree. kd-trees are a variant that store data only in leaf nodes.

2. Constructing a classical kd-tree ( Pseudo code)

Since there are many possible ways to choose axis-aligned splitting planes, there are many different ways
to construct kd-trees. The canonical method of kd-tree construction has the following constraints:

* As one moves down the tree, one cycles through the axes used to select the splitting planes.
(For example, the root would have an x-aligned plane, the root's children would both have y-aligned
planes, the root's grandchildren would all have z-aligned planes, and so on.)
* At each step, the point selected to create the splitting plane is the median of the points being
put into the kd-tree, with respect to their coordinates in the axis being used. (Note the assumption
that we feed the entire set of points into the algorithm up-front.)

This method leads to a balanced kd-tree, in which each leaf node is about the same distance from the root.
However, balanced trees are not necessarily optimal for all applications.
The following pseudo-code illustrates this canonical construction procedure (NOTE, that the procedure used
by the TKDTree class is a bit different, the following pseudo-code is given as a simple illustration of the
concept):

function kdtree (list of points pointList, int depth)
{
if pointList is empty
return nil;
else
{
// Select axis based on depth so that axis cycles through all valid values
var int axis := depth mod k;

// Sort point list and choose median as pivot element
select median from pointList;

// Create node and construct subtrees
var tree_node node;
node.location := median;
node.leftChild := kdtree(points in pointList before median, depth+1);
node.rightChild := kdtree(points in pointList after median, depth+1);
return node;
}
}

Our construction method is optimized to save memory, and differs a bit from the constraints above.
In particular, the division axis is chosen as the one with the biggest spread, and the point to create the
splitting plane is chosen so, that one of the two subtrees contains exactly 2^k terminal nodes and is a
perfectly balanced binary tree, and, while at the same time, trying to keep the number of terminal nodes
in the 2 subtrees as close as possible. The following section gives more details about our implementation.

3. Using TKDTree

3a. Creating the tree and setting the data
The interface of the TKDTree, that allows to set input data, has been developped to simplify using it
together with TTree::Draw() functions. That's why the data has to be provided column-wise. For example:
{
TTree *datatree = ...

datatree->Draw("x:y:z", "selection", "goff");
//now make a kd-tree on the drawn variables
TKDTreeID *kdtree = new TKDTreeID(npoints, 3, 1);
kdtree->SetData(0, datatree->GetV1());
kdtree->SetData(1, datatree->GetV2());
kdtree->SetData(2, datatree->GetV3());
kdtree->Build();
}
NOTE, that this implementation of kd-tree doesn't support adding new points after the tree has been built
Of course, it's not necessary to use TTree::Draw(). What is important, is to have data columnwise.
An example with regular arrays:
{
Int_t npoints = 100000;
Int_t ndim = 3;
Int_t bsize = 1;
Double_t xmin = -0.5;
Double_t xmax = 0.5;
Double_t *data0 = new Double_t[npoints];
Double_t *data1 = new Double_t[npoints];
Double_t *data2 = new Double_t[npoints];
Double_t *y     = new Double_t[npoints];
for (Int_t i=0; i<npoints; i++){
data0[i]=gRandom->Uniform(xmin, xmax);
data1[i]=gRandom->Uniform(xmin, xmax);
data2[i]=gRandom->Uniform(xmin, xmax);
}
TKDTreeID *kdtree = new TKDTreeID(npoints, ndim, bsize);
kdtree->SetData(0, data0);
kdtree->SetData(1, data1);
kdtree->SetData(2, data2);
kdtree->Build();
}

By default, the kd-tree doesn't own the data and doesn't delete it with itself. If you want the
data to be deleted together with the kd-tree, call TKDTree::SetOwner(kTRUE).

Most functions of the kd-tree don't require the original data to be present after the tree
has been built. Check the functions documentation for more details.

3b. Navigating the kd-tree

Nodes of the tree are indexed top to bottom, left to right. The root node has index 0. Functions
TKDTree::GetLeft(Index inode), TKDTree::GetRight(Index inode) and TKDTree::GetParent(Index inode)
allow to find the children and the parent of a given node.

For a given node, one can find the indexes of the original points, contained in this node,
by calling the GetNodePointsIndexes(Index inode) function. Additionally, for terminal nodes,
there is a function GetPointsIndexes(Index inode) that returns a pointer to the relevant
part of the index array. To find the number of point in the node
(not only terminal), call TKDTree::GetNpointsNode(Index inode).

4.  TKDtree implementation details - internal information, not needed to use the kd-tree.
4a. Order of nodes in the node information arrays:

TKDtree is optimized to minimize memory consumption.
Nodes of the TKDTree do not store pointers to the left and right children or to the parent node,
but instead there are several 1-d arrays of size fNNodes with information about the nodes.
The order of the nodes information in the arrays is described below. It's important to understand
it, if one's class needs to store some kind of additional information on the per node basis, for
example, the fit function parameters.

Drawback:   Insertion to the TKDtree is not supported.

As noted above, the construction of the kd-tree involves choosing the axis and the point on
that axis to divide the remaining points approximately in half. The exact algorithm for choosing
the division point is described in the next section. The sequence of divisions is
recorded in the following arrays:
fAxix[fNNodes]  - Division axis (0,1,2,3 ...)
fValue[fNNodes] - Division value

Given the index of a node in those arrays, it's easy to find the indices, corresponding to
children nodes or the parent node:
Suppose, the parent node is stored under the index inode. Then:
Left child index  = inode*2+1
Right child index =  (inode+1)*2
Suppose, that the child node is stored under the index inode. Then:
Parent index = inode/2

Number of division nodes and number of terminals :
fNNodes = (fNPoints/fBucketSize)

The nodes are filled always from left side to the right side:
Let inode be the index of a node, and irow - the index of a row
The TKDTree looks the following way:
Ideal case:
Number of _terminal_ nodes = 2^N,  N=3

INode
irow 0     0                                                                   -  1 inode
irow 1     1                              2                                    -  2 inodes
irow 2     3              4               5               6                    -  4 inodes
irow 3     7       8      9      10       11     12       13      14           -  8 inodes

Non ideal case:
Number of _terminal_ nodes = 2^N+k,  N=3  k=1

INode
irow 0     0                                                                   - 1 inode
irow 1     1                              2                                    - 2 inodes
irow 2     3              4               5               6                    - 3 inodes
irow 3     7       8      9      10       11     12       13      14           - 8 inodes
irow 4     15  16                                                              - 2 inodes

3b. The division algorithm:

As described above, the kd-tree is built by repeatingly dividing the given set of points into
2 smaller sets. The cut is made on the axis with the biggest spread, and the value on the axis,
on which the cut is performed, is chosen based on the following formula:
Suppose, we want to divide n nodes into 2 groups, left and right. Then the left and right
will have the following number of nodes:

n=2^k+rest

Left  = 2^k-1 +  ((rest>2^k-2) ?  2^k-2      : rest)
Right = 2^k-1 +  ((rest>2^k-2) ?  rest-2^k-2 : 0)

For example, let n_nodes=67. Then, the closest 2^k=64, 2^k-1=32, 2^k-2=16.
Left node gets 32+3=35 sub-nodes, and the right node gets 32 sub-nodes

The division process continues until all the nodes contain not more than a predefined number
of points.

3c. The order of nodes in boundary-related arrays

Some kd-tree based algorithms need to know the boundaries of each node. This information can
be computed by calling the TKDTree::MakeBoundaries() function. It fills the following arrays:

fRange : array containing the boundaries of the domain:
| 1st dimension (min + max) | 2nd dimension (min + max) | ...
fBoundaries : nodes boundaries
| 1st node {1st dim * 2 elements | 2nd dim * 2 elements | ...} | 2nd node {...} | ...
The nodes are arranged in the order described in section 3a.

```

#### This class is also known as (typedefs to this class)

TKDTreeID, TKDTree<Int_t,Double_t>

## Function Members (Methods)

public:
 TKDTree() TKDTree(int npoints, int ndim, UInt_t bsize) TKDTree(int npoints, int ndim, UInt_t bsize, double** data) virtual ~TKDTree() void TObject::AbstractMethod(const char* method) const virtual void TObject::AppendPad(Option_t* option = "") virtual void TObject::Browse(TBrowser* b) void Build() static TClass* Class() virtual const char* TObject::ClassName() const virtual void TObject::Clear(Option_t* = "") virtual TObject* TObject::Clone(const char* newname = "") const virtual Int_t TObject::Compare(const TObject* obj) const virtual void TObject::Copy(TObject& object) const virtual void TObject::Delete(Option_t* option = "")MENU Double_t Distance(const double* point, int ind, Int_t type = 2) const void DistanceToNode(const double* point, int inode, double& min, double& max, Int_t type = 2) virtual Int_t TObject::DistancetoPrimitive(Int_t px, Int_t py) virtual void TObject::Draw(Option_t* option = "") virtual void TObject::DrawClass() constMENU virtual TObject* TObject::DrawClone(Option_t* option = "") constMENU virtual void TObject::Dump() constMENU virtual void TObject::Error(const char* method, const char* msgfmt) const virtual void TObject::Execute(const char* method, const char* params, Int_t* error = 0) virtual void TObject::Execute(TMethod* method, TObjArray* params, Int_t* error = 0) virtual void TObject::ExecuteEvent(Int_t event, Int_t px, Int_t py) virtual void TObject::Fatal(const char* method, const char* msgfmt) const void FindBNodeA(double* point, double* delta, Int_t& inode) void FindInRange(double* point, double range, vector& res) void FindNearestNeighbors(const double* point, Int_t k, int* ind, double* dist) int FindNode(const double* point) const virtual TObject* TObject::FindObject(const char* name) const virtual TObject* TObject::FindObject(const TObject* obj) const void FindPoint(double* point, int& index, Int_t& iter) double* GetBoundaries() double* GetBoundariesExact() double* GetBoundary(const Int_t node) double* GetBoundaryExact(const Int_t node) int GetBucketSize() Int_t GetCrossNode() virtual Option_t* TObject::GetDrawOption() const static Long_t TObject::GetDtorOnly() virtual const char* TObject::GetIconName() const int* GetIndPoints() Int_t GetLeft(Int_t inode) const virtual const char* TObject::GetName() const int GetNDim() Int_t GetNNodes() const UChar_t GetNodeAxis(Int_t id) const void GetNodePointsIndexes(Int_t node, Int_t& first1, Int_t& last1, Int_t& first2, Int_t& last2) const double GetNodeValue(Int_t id) const int GetNPoints() int GetNPointsNode(Int_t node) const virtual char* TObject::GetObjectInfo(Int_t px, Int_t py) const static Bool_t TObject::GetObjectStat() Int_t GetOffset() virtual Option_t* TObject::GetOption() const Int_t GetParent(Int_t inode) const int* GetPointsIndexes(Int_t node) const Int_t GetRight(Int_t inode) const Int_t GetRowT0() virtual const char* TObject::GetTitle() const Int_t GetTotalNodes() const virtual UInt_t TObject::GetUniqueID() const virtual Bool_t TObject::HandleTimer(TTimer* timer) virtual ULong_t TObject::Hash() const virtual void TObject::Info(const char* method, const char* msgfmt) const virtual Bool_t TObject::InheritsFrom(const char* classname) const virtual Bool_t TObject::InheritsFrom(const TClass* cl) const virtual void TObject::Inspect() constMENU void TObject::InvertBit(UInt_t f) virtual TClass* IsA() const virtual Bool_t TObject::IsEqual(const TObject* obj) const virtual Bool_t TObject::IsFolder() const Bool_t TObject::IsOnHeap() const Int_t IsOwner() virtual Bool_t TObject::IsSortable() const Bool_t IsTerminal(int inode) const Bool_t TObject::IsZombie() const double KOrdStat(int ntotal, double* a, int k, int* index) const virtual void TObject::ls(Option_t* option = "") const void MakeBoundaries(double* range = 0x0) void MakeBoundariesExact() void TObject::MayNotUse(const char* method) const virtual Bool_t TObject::Notify() void TObject::Obsolete(const char* method, const char* asOfVers, const char* removedFromVers) const static void TObject::operator delete(void* ptr) static void TObject::operator delete(void* ptr, void* vp) static void TObject::operator delete[](void* ptr) static void TObject::operator delete[](void* ptr, void* vp) void* TObject::operator new(size_t sz) void* TObject::operator new(size_t sz, void* vp) void* TObject::operator new[](size_t sz) void* TObject::operator new[](size_t sz, void* vp) virtual void TObject::Paint(Option_t* option = "") virtual void TObject::Pop() virtual void TObject::Print(Option_t* option = "") const virtual Int_t TObject::Read(const char* name) virtual void TObject::RecursiveRemove(TObject* obj) void TObject::ResetBit(UInt_t f) virtual void TObject::SaveAs(const char* filename = "", Option_t* option = "") constMENU virtual void TObject::SavePrimitive(ostream& out, Option_t* option = "") void TObject::SetBit(UInt_t f) void TObject::SetBit(UInt_t f, Bool_t set) Int_t SetData(int idim, double* data) void SetData(int npoints, int ndim, UInt_t bsize, double** data) virtual void TObject::SetDrawOption(Option_t* option = "")MENU static void TObject::SetDtorOnly(void* obj) static void TObject::SetObjectStat(Bool_t stat) void SetOwner(Int_t owner) virtual void TObject::SetUniqueID(UInt_t uid) virtual void ShowMembers(TMemberInspector& insp) void Spread(int ntotal, double* a, int* index, double& min, double& max) const virtual void Streamer(TBuffer& b) void StreamerNVirtual(TBuffer& b) virtual void TObject::SysError(const char* method, const char* msgfmt) const Bool_t TObject::TestBit(UInt_t f) const Int_t TObject::TestBits(UInt_t f) const virtual void TObject::UseCurrentStyle() virtual void TObject::Warning(const char* method, const char* msgfmt) const virtual Int_t TObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) virtual Int_t TObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) const
protected:
 virtual void TObject::DoError(int level, const char* location, const char* fmt, va_list va) const void TObject::MakeZombie()
private:
 TKDTree(const TKDTree&) void CookBoundaries(const Int_t node, Bool_t left) TKDTree& operator=(const TKDTree&) void UpdateNearestNeighbors(int inode, const double* point, Int_t kNN, int* ind, double* dist) void UpdateRange(int inode, double* point, double range, vector& res)

## Data Members

public:
 enum TObject::EStatusBits { kCanDelete kMustCleanup kObjInCanvas kIsReferenced kHasUUID kCannotPick kNoContextMenu kInvalidObject }; enum TObject::[unnamed] { kIsOnHeap kNotDeleted kZombie kBitMask kSingleKey kOverwrite kWriteDelete };
protected:
 UChar_t* fAxis [fNNodes] nodes cutting axis double* fBoundaries ! nodes boundaries int fBucketSize size of the terminal nodes Int_t fCrossNode ! cross node - node that begins the last row (with terminal nodes only) double** fData ! data points Int_t fDataOwner ! 0 - not owner, 2 - owner of the pointer array, 1 - owner of the whole 2-d array int* fIndPoints ! array of points indexes int fNDim number of dimensions int fNDimm dummy 2*fNDim Int_t fNNodes size of node array int fNPoints number of multidimensional points Int_t fOffset ! offset in fIndPoints - if there are 2 rows, that contain terminal nodes double* fRange [fNDimm] range of data for each dimension Int_t fRowT0 ! smallest terminal row - first row that contains terminal nodes Int_t fTotalNodes total number of nodes (fNNodes + terminal nodes) double* fValue [fNNodes] nodes cutting value

## Function documentation

void TKDTree<Index, Value> Build()
``` Build the kd-tree

1. calculate number of nodes
2. calculate first terminal row
3. initialize index array
4. non recursive building of the binary tree

The tree is divided recursively. See class description, section 4b for the details
of the division alogrithm
```
void TKDTree<Index, Value> FindNearestNeighbors(const double* point, Int_t k, int* ind, double* dist)
```Find kNN nearest neighbors to the point in the first argument
Returns 1 on success, 0 on failure
Arrays ind and dist are provided by the user and are assumed to be at least kNN elements long
```
void TKDTree<Index, Value> UpdateNearestNeighbors(int inode, const double* point, Int_t kNN, int* ind, double* dist)
```Update the nearest neighbors values by examining the node inode
```
Double_t TKDTree<Index, Value> Distance(const double* point, int ind, Int_t type = 2) const
```Find the distance between point of the first argument and the point at index value ind
Type argument specifies the metric: type=2 - L2 metric, type=1 - L1 metric
```
void TKDTree<Index, Value> DistanceToNode(const double* point, int inode, double& min, double& max, Int_t type = 2)
```Find the minimal and maximal distance from a given point to a given node.
Type argument specifies the metric: type=2 - L2 metric, type=1 - L1 metric
If the point is inside the node, both min and max are set to 0.
```
Index TKDTree<Index, Value> FindNode(const double* point) const
``` returns the index of the terminal node to which point belongs
(index in the fAxis, fValue, etc arrays)
returns -1 in case of failure
```
void TKDTree<Index, Value> FindPoint(double* point, int& index, Int_t& iter)
``` find the index of point
works only if we keep fData pointers
```
void TKDTree<Index, Value> FindInRange(double* point, double range, vector<int>& res)
```Find all points in the sphere of a given radius "range" around the given point
1st argument - the point
2nd argument - radius of the shere
3rd argument - a vector, in which the results will be returned
```
void TKDTree<Index, Value> UpdateRange(int inode, double* point, double range, vector<int>& res)
```Internal recursive function with the implementation of range searches
```
Index* TKDTree<Index, Value> GetPointsIndexes(Int_t node) const
```return the indices of the points in that terminal node
for all the nodes except last, the size is fBucketSize
for the last node it's fOffset%fBucketSize
```
void TKDTree<Index, Value> GetNodePointsIndexes(Int_t node, Int_t& first1, Int_t& last1, Int_t& first2, Int_t& last2) const
```Return the indices of points in that node
Indices are returned as the first and last value of the part of indices array, that belong to this node
Sometimes points are in 2 intervals, then the first and last value for the second one are returned in
third and fourth parameter, otherwise first2 is set to 0 and last2 is set to -1
To iterate over all the points of the node #inode, one can do, for example:
Index *indices = kdtree->GetPointsIndexes();
Int_t first1, last1, first2, last2;
kdtree->GetPointsIndexes(inode, first1, last1, first2, last2);
for (Int_t ipoint=first1; ipoint<=last1; ipoint++){
point = indices[ipoint];
//do something with point;
}
for (Int_t ipoint=first2; ipoint<=last2; ipoint++){
point = indices[ipoint];
//do something with point;
}
```
Index TKDTree<Index, Value> GetNPointsNode(Int_t node) const
```Get number of points in this node
for all the terminal nodes except last, the size is fBucketSize
for the last node it's fOffset%fBucketSize, or if fOffset%fBucketSize==0, it's also fBucketSize
```
void TKDTree<Index, Value> SetData(int npoints, int ndim, UInt_t bsize, double** data)
``` Set the data array. See the constructor function comments for details
```
Int_t TKDTree<Index, Value> SetData(int idim, double* data)
```Set the coordinate #ndim of all points (the column #ndim of the data matrix)
After setting all the data columns, proceed by calling Build() function
Note, that calling this function after Build() is not possible
Note also, that no checks on the array sizes is performed anywhere
```
void TKDTree<Index, Value> Spread(int ntotal, double* a, int* index, double& min, double& max) const
```Calculate spread of the array a
```
Value TKDTree<Index, Value> KOrdStat(int ntotal, double* a, int k, int* index) const
```copy of the TMath::KOrdStat because I need an Index work array
```
void TKDTree<Index, Value> MakeBoundaries(double* range = 0x0)
``` Build boundaries for each node. Note, that the boundaries here are built
based on the splitting planes of the kd-tree, and don't necessarily pass
through the points of the original dataset. For the latter functionality
see function MakeBoundariesExact()
Boundaries can be retrieved by calling GetBoundary(inode) function that would
return an array of boundaries for the specified node, or GetBoundaries() function
that would return the complete array.
```
void TKDTree<Index, Value> CookBoundaries(const Int_t node, Bool_t left)
``` define index of this terminal node
```
void TKDTree<Index, Value> MakeBoundariesExact()
``` Build boundaries for each node. Unlike MakeBoundaries() function
the boundaries built here always pass through a point of the original dataset
So, for example, for a terminal node with just one point minimum and maximum for each
dimension are the same.
Boundaries can be retrieved by calling GetBoundaryExact(inode) function that would
return an array of boundaries for the specified node, or GetBoundaries() function
that would return the complete array.
```
void TKDTree<Index, Value> FindBNodeA(double* point, double* delta, Int_t& inode)
``` find the smallest node covering the full range - start

```
Value* TKDTree<Index, Value> GetBoundaries()
``` Get the boundaries
```
Value* TKDTree<Index, Value> GetBoundariesExact()
``` Get the boundaries
```
Value* TKDTree<Index, Value> GetBoundary(const Int_t node)
``` Get a boundary
```
Value* TKDTree<Index, Value> GetBoundaryExact(const Int_t node)
``` Get a boundary
```
Int_t GetLeft(Int_t inode) const
``` Get indexes of left and right daughter nodes
```
`{return inode*2+1;}`
Int_t GetRight(Int_t inode) const
`{return (inode+1)*2;}`
Int_t GetParent(Int_t inode) const
``` Other getters
```
`{return (inode-1)/2;}`
UChar_t GetNodeAxis(Int_t id) const
`{return (id < 0 || id >= fNNodes) ? 0 : fAxis[id];}`
Value GetNodeValue(Int_t id) const
`{return (id < 0 || id >= fNNodes) ? 0 : fValue[id];}`
Int_t GetNNodes() const
`{return fNNodes;}`
Int_t GetTotalNodes() const
`{return fTotalNodes;}`
Index GetNPoints()
`{ return fNPoints; }`
Index GetNDim()
`{ return fNDim; }`

```Getters for internal variables.
```
`{return fRowT0;}`

`{return fCrossNode;}`

`{return fOffset;}`
Index* GetIndPoints()
`{return fIndPoints;}`
Index GetBucketSize()
`{return fBucketSize;}`
Bool_t IsTerminal(int inode) const
`{return (inode>=fNNodes);}`

`{ return fDataOwner; }`
void SetOwner(Int_t owner)
`{ fDataOwner = owner; }`
TKDTree<Index, Value>& operator=(const TKDTree<int,double>& )