Noteworthy Algorithms: Data structures

Binary heap

A binary heap is a heap data structure created using a binary tree. It can be seen as a binary tree with two additional constraints:

The shape property: the tree is a complete binary tree; that is, all levels of the tree, except possibly the last one (deepest) are fully filled, and, if the last level of the tree is not complete, the nodes of that level are filled from left to right.
The heap property: each node is greater than or equal to each of its children according to a comparison predicate defined for the data structure.

Heaps with a mathematical "greater than or equal to" comparison function are called max-heaps; those with a mathematical "less than or equal to" comparison function are called min-heaps. Min-heaps are often used to implement priority queues.

Insert

Add the element to the bottom level of the heap.
Compare the added element with its parent; if they are in the correct order, stop.
If not, swap the element with its parent and return to the previous step.

Delete

Replace the root of the heap with the last element on the last level.
Compare the new root with its children; if they are in the correct order, stop.
If not, swap the element with one of its children and return to the previous step. (Swap with its smaller child in a min-heap and its larger child in a max-heap.)

Operations complexity: O(log n)

http://en.wikipedia.org/wiki/Binary_heap

Also see Binomial Heaps: http://www.geeksforgeeks.org/binomial-heap-2/

Graph

	Matrix	List
Adjacency	V x V	V lists of neighbors
Incidence	V x E	V lists of edges

k-d trees

A k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches).

The k-d tree is a binary tree in which every node is a k-dimensional point. Every non-leaf node can be thought of as implicitly generating a splitting hyperplane that divides the space into two parts, known as half-spaces. Points to the left of this hyperplane are represented by the left subtree of that node and points right of the hyperplane are represented by the right subtree. The hyperplane direction is chosen in the following way: every node in the tree is associated with one of the k-dimensions, with the hyperplane perpendicular to that dimension's axis. So, for example, if for a particular split the "x" axis is chosen, all points in the subtree with a smaller "x" value than the node will appear in the left subtree and all points with larger "x" value will be in the right subtree. In such a case, the hyperplane would be set by the x-value of the point, and its normal would be the unit x-axis.

Searching for a nearest neighbour in a k-d tree proceeds as follows:

Starting with the root node, the algorithm moves down the tree recursively, in the same way that it would if the search point were being inserted (i.e. it goes left or right depending on whether the point is less than or greater than the current node in the split dimension).
Once the algorithm reaches a leaf node, it saves that node point as the "current best"
The algorithm unwinds the recursion of the tree, performing the following steps at each node:
1. If the current node is closer than the current best, then it becomes the current best.
2. The algorithm checks whether there could be any points on the other side of the splitting plane that are closer to the search point than the current best. In concept, this is done by intersecting the splitting hyperplane with a hypersphere around the search point that has a radius equal to the current nearest distance. Since the hyperplanes are all axis-aligned this is implemented as a simple comparison to see whether the difference between the splitting coordinate of the search point and current node is less than the distance (overall coordinates) from the search point to the current best.
  1. If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search.
  2. If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.
When the algorithm finishes this process for the root node, then the search is complete.

https://en.wikipedia.org/wiki/Kd-tree
http://www.geeksforgeeks.org/k-dimensional-tree/

Splay trees

A splay tree is a self-adjusting binary search tree with the additional property that recently accessed elements are quick to access again. It performs basic operations such as insertion, look-up and removal in O(log n) amortized time. For many sequences of non-random operations, splay trees perform better than other search trees, even when the specific pattern of the sequence is unknown.

All normal operations on a binary search tree are combined with one basic operation, called splaying. Splaying the tree for a certain element rearranges the tree so that the element is placed at the root of the tree. One way to do this is to first perform a standard binary tree search for the element in question, and then use tree rotations in a specific fashion to bring the element to the top. Alternatively, a top-down algorithm can combine the search and the tree reorganization into a single phase.

Splay trees can change even when they are accessed in a 'read-only' manner (i.e. by find operations). This complicates the use of such splay trees in a multi-threaded environment. Specifically, extra management is needed if multiple threads are allowed to perform find operations concurrently.

http://en.wikipedia.org/wiki/Splay_tree

Suffix tree

Suffix tree for the text BANANA. Each substring is terminated with special character $. The six paths from the root to a leaf (shown as boxes) correspond to the six suffixes A$, NA$,ANA$, NANA$, ANANA$ and BANANA$. The numbers in the leaves give the start position of the corresponding suffix. Suffix links, drawn dashed, are used during construction.

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix tree allows a particularly fast implementation of many important string operations.

The construction of such a tree for the string $S$ takes time and space linear in the length of $S$ . Once constructed, several operations can be performed quickly, for instance locating a substring in $S$ , locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc. Suffix trees also provided one of the first linear-time solutions for the longest common substring problem. These speedups come at a cost: storing a string's suffix tree typically requires significantly more space than storing the string itself.

The suffix tree for the string $S$ of length $n$ is defined as a tree such that:^[2]

the paths from the root to the leaves have a one-to-one relationship with the suffixes of $S$ ,
edges spell non-empty strings,
and all internal nodes (except perhaps the root) have at least two children.

Since such a tree does not exist for all strings, $S$ is padded with a terminal symbol not seen in the string (usually denoted $). This ensures that no suffix is a prefix of another, and that there will be $n$ leaf nodes, one for each of the $n$ suffixes of $S$ . Since all internal non-root nodes are branching, there can be at most n − 1 such nodes, and n + (n − 1) + 1 = 2n nodes in total (n leaves, n − 1 internal non-root nodes, 1 root).

A generalized suffix tree is a suffix tree made for a set of words instead of only for a single word. It represents all suffixes from this set of words. Each word must be terminated by a different termination symbol or word.

Ukkonen algorithm allows one to build a suffix tree in O(n) time. It is an online algorithm (works for streams).

https://en.wikipedia.org/wiki/Suffix_tree
http://www.geeksforgeeks.org/pattern-searching-set-8-suffix-tree-introduction/
http://www.geeksforgeeks.org/suffix-array-set-1-introduction/
http://stackoverflow.com/questions/9452701/ukkonens-suffix-tree-algorithm-in-plain-english
http://www.geeksforgeeks.org/pattern-searching-using-trie-suffixes/

Interval Tree

Interval Tree: The idea is to augment a self-balancing Binary Search Tree (BST) like Red Black Tree, AVL Tree, etc to maintain set of intervals so that all operations can be done in O(Logn) time.

Every node of Interval Tree stores following information.

a) i: An interval which is represented as a pair [low, high]

b) max: Maximum high value in subtree rooted with this node.

The low value of an interval is used as key to maintain order in BST. The insert and delete operations are same as insert and delete in self-balancing BST used.

The main operation is to search for an overlapping interval. Following is algorithm for searching an overlapping interval x in an Interval tree rooted with root:

Interval overlappingIntervalSearch(root, x)
1) If x overlaps with root's interval, return the root's interval.
2) If left child of root is not empty and the max in left child
is greater than x's low value, recur for left child
3) Else recur for right child.

http://www.geeksforgeeks.org/interval-tree/

Noteworthy Algorithms

Pages

Data structures

No comments:

Post a Comment