Quicksort Example
We use \(\sim\) to represent comparisons of algos more often.
Assumption
- If number of counting is \(C\), then the running time is \(\sim aC\).
- numbers are randomly distinctly in the array
Formula
To simplify that :
Apply symmetry
Multiply both sides by \(N\)
Substract same formula for \(N-1\),
Collect terms
To solve this, Divide by \(N(N+1)\):
Telescope:
Checking
- the math result
- model
Note that #comparison distrib. is not normal.
Tips on predicting:
- Hypothesis is \(\quad \operatorname{an} N \ln N\)
Experiment
- Run for size \(N\).
- could solve for a
- for \(10 N\) to increase a factor. $$ \begin{aligned} \frac{a(10 N) \ln (10 x)}{a x \ln N} & =10+\frac{\ln 10}{\ln N} \ & =10+\frac{1}{\log _{10} N} . \end{aligned} $$
Validate - refine - analyze cycle.
Exercise
Problem 1.14: Follow through the steps given for quicksort to solve the recurrence:
This is less than \(O(n)\) by the following:
Problem 1.15. Show that the average number of exchanges used during the first partitioning stage (before the pointers cross) is \((N-2)/6\).
Original wrong solution: Seen each one as a permutation, settling at the wrong place is of probability \(1/2\) (assume the pivot is at the center). So in all it's \(n/2\).
Summary:
- Suppose after choosing pivot \(p\), we have \(a = (p,\ \underbrace{a_1,\ a_2,\ \cdots,\ a_{p-1},}_{p-1\text{ numbers}}\ \ \underbrace{a_p,\ a_{p+1},\ \cdots\ a_{n-1}}_{n-p\text{ numbers}})\)
- the number of elements greater than \(p\)
in the front segment is always the same as the number of elements smaller than \(p\)
in the back segment.
- otherwise they are not balanced
- Exchange
- first element greater than \(p\) in the front segment is exchanged with the last element smaller than \(p\) in the back segment
- second element greater than \(p\) in the front segment is exchanged with the second last element smaller than \(p\) in the back segment.
Hence, $$ \text{total number of exchanges}=\sum_{p=1}^n\sum_{a_0=p}\text{number of exchanges on }a $$
total number of exchanges is
- \(\left(\begin{array}{c}n-p \\ x\end{array}\right)\): the number of ways to choose \(x\) elements out of all \(n-p\) numbers that are greater than \(p\) for the front segment
- \(\left(\begin{array}{c}p-1 \\ p-1-x\end{array}\right)\): the number of ways to chose \(p-1-x\) elements out of all \(p-1\) numbers that are smaller than \(p\) for the front segment
- \((p-1)!\): the number of ways to permute all \(p-1\) elements in the front segment
- \((n-p)!\): the number of ways to permute all \(n-p\) elements in the back segment
Hence,
where avg is then divided by \(n!\).
Problem 1.17 If we change the first line in the quicksort implementation given in the lecture to call insertion sort when the subfile size is not greater than \(M\) then the total number of comparisons to sort \(N\) elements is described by the recurrence
Problem 1.18 Ignoring small terms (those significantly less than \(N\) ) in the answer to the previous question, find a function \(f(M)\) so that the number of comparisons is approximately \(2 N \ln N+f(M) N\). Plot the function \(f(M)\), and find the value of \(M\) that minimizes the function.
Taking the derivative to find the minimal:
expr = 1/4 M (M - 1) + 2 (N - M) Log[N - M];
exprSimplified = 1/4 M (M - 1) + 2 N Log[N] - 2 M Log[N];
derivative = D[exprSimplified, M];
solutions = Solve[derivative == 0, M];
minM = M /. solutions[[1]]
yields \(\frac{1}{2} (8 \log (N)+1)\).