Sorting algorithms


 German version  up

With its time complexity of O(n log(n)) heapsort is optimal. It utilizes a special data structure called heap. This data structure is explained in the following.



Definition:  Let T = (VE) an almost complete binary tree with a vertex labelling a : V arrow M that assigns to each vertex u a label a(u) from an ordered set (M<=).

A vertex u element V has the heap property if it has no direct descendant with a greater label, i.e.

for all v element V  :  (uvelement E   implies   a(u>= a(v)

T is a heap if all vertices have the heap property, i.e.

for all (uvelement E  :  a(u>= a(v)

We call T a semi-heap if all vertices except possibly the root r have the heap property, i.e.

for all (uvelement E,  unot equalr  :  a(u>= a(v)


Heap with n = 10 vertices
Figure 1:  Heap with n = 10 vertices

Observe that each leaf automatically has the heap property regardless of its label, since it has no descendants.



The data structure of the heapsort algorithm is a heap. The data sequence to be sorted is stored as the labels of the binary tree. As shown later, in the implementation no pointer structures are necessary to represent the tree, since an almost complete binary tree can be efficently stored in an array.

Heapsort algorithm

The following description of heapsort refers to Figure 2 (a) - (e).

extract maximum element delete last leaf and write its label to the root exchange label with the maximum label of its direct descendants
(a) (b) (c)
exchange label with the maximum label of its direct descendants heap restored
(d) (e)
Figure 2:  Retrieving the maximum element and restoring the heap

If the sequence to be sorted is arranged as a heap, the greatest element of the sequence can be retrieved immediately from the root (a). In order to get the next-greatest element, the rest of the elements have to be rearranged as a heap.

The rearrangement is done in the following way:

Let b be a leaf of maximum depth. Write the label of b to the root and delete leaf b (b). Now the tree is a semi-heap, since the root possibly has lost its heap property.

Making a heap from a semi-heap is simple:

Do nothing if the root already has the heap property, otherwise exchange its label with the maximum label of its direct descendants (c). Let this descendant be v. Now possibly v has lost its heap property. Proceed further with v, i.e. make a heap from the semi-heap rooted at v (d). This process stops when a vertex is reached that has the heap property (e). Eventually this is the case at a leaf.

Making a heap from a semi-heap can conceptionally be implemented by the following procedure downheap:


procedure downheap (v)
Input:semi-heap with root v
Output:heap (by rearranging the vertex labels)
  1. while v does not have the heap property do
    1. choose direct descendant w with maximum label a(w)

      exchange a(v) and a(w)

      set v := w


Procedure downheap can be used to build a heap from an arbitrarily labelled tree. By proceeding bottom-up, downheap is called for all subtrees rooted at inner vertices. Since leaves are already heaps they may be omitted.


procedure buildheap
Input:almost complete binary tree T of depth d(T) with vertex labelling a
Output:heap (by rearranging the vertex labels)
  1. for i := d(T) – 1 downto 0 do
    1. for all inner vertices v of depth d(v) = i do
      1. downheap (v)


A call of buildheap is the first step of procedure heapsort, which can now be written down as follows:


procedure heapsort
Input:almost complete binary tree with root r and vertex labelling a
Output:vertex labels in descending order
  1. buildheap

    while r is not a leaf do

    1. output a(r)

      choose leaf b of maximum depth

      write label a(b) to r

      delete leaf b

      downheap (r)

    output a(r)




This simulation illustrates the process of building a heap from an arbitrarily labelled tree.


(Java applet for simulation of function buildheap)



An almost complete binary tree with n vertices has a depth of at most log(n). Therefore, procedure downheap requires at most log(n) steps. Procedure buildheap calls downheap for each vertex, therefore it requires at most n·log(n) steps. Heapsort calls buildheap once; then it calls downheap for each vertex, together it requires at most 2·n·log(n) steps.

Thus, the time complexity of heapsort is T(nelement O(n·log(n)). The algorithm is optimal, since the lower bound of the sorting problem is attained.



An almost complete binary tree with n vertices and vertex labelling a can be stored most efficiently in an array a:

All vertices at positions 0, ..., n/2-1 are inner nodes, all vertices n/2, ..., n-1  are leaves (integer division / ). The heap with n = 10 vertices of Figure 1 is shown in Figure 3 as an example.

Array representation of a heap
Figure 3:  Array representation of the heap of Figure 1

Using this array representation of the heap, heapsort can be implemented as an in-place sorting algorithm. In each step of heapsort, the root label a(r) is not output but stored at the position of the leaf b that is deleted in the following. Deleting leaf b means to consider just the array elements left of b as the remaining heap.

In other words: the four steps

are replaced by an exchange of the root label and the label of b:



The following Java class HeapSorter encapsulates the functions downheap, buildheap and heapsort. In order to sort an array b, Heapsort is called with the statement HeapSorter.sort(b).

public class HeapSorter 
    private static int[] a;
    private static int n;

    public static void sort(int[] a0)

    private static void heapsort()
        while (n>1)
            exchange (0, n);
            downheap (0);

    private static void buildheap()
        for (int v=n/2-1; v>=0; v--)
            downheap (v);

    private static void downheap(int v)
        int w=2*v+1;    // first descendant of v
        while (w<n)
            if (w+1<n)    // is there a second descendant?
                if (a[w+1]>a[w]) w++;
            // w is the descendant of v with maximum label

            if (a[v]>=a[w]) return;  // v has heap property
            // otherwise
            exchange(v, w);  // exchange labels of v and w
            v=w;        // continue

    private static void exchange(int i, int j)
        int t=a[i];

}    // end class HeapSorter



With its time complexity of O(n log(n)) heapsort is optimal. Unlike mergesort, heapsort requires no extra space. On the other hand, heapsort is not stable. A sorting algorithm is stable, if it leaves the order of equal elements unchanged.

The heap data structure can also be used for an efficient implementation of a priority queue. A priority queue is an abstract list data structure with the operations insert and extractMax. With each element in the list a certain priority is associated. By insert an element together with its priority is inserted into the list. By extractMax the element with the highest priority is extracted from the list. Using a heap, both operations can be implemented in time O(log(n)).



Heapsort was originally published by J.W.J. Williams as Algorithm 232 in the journal Communications of the ACM [Wil 64].

[AHU 74]A.V. Aho, J.E. Hopcroft, J.D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley (1974)
[CLRS 01]T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein: Introduction to Algorithms. 2nd edition, The MIT Press (2001)
[Sed 88]R. Sedgewick: Algorithms. 2nd edition, Addison-Wesley (1988)
[Wil 64]J.W.J. Williams: Algorithm 232: Heapsort. Communications of the ACM, 7, 6, 347-348 (1964)



Next:   [Mergesort]   or   up Google Ma.gnolia Mister Wong StumbleUpon YahooMyWeb LinkARENA


homeH.W. Lang   FH Flensburg   Impressum   ©   Created: 25.02.1997   Updated: 18.05.2010
Valid HTML 4.01 Transitional