細說傾聽。技安聚星堂: complexity analysis

This analysis is based on the amount of work done by the algorithm.

Objectives of time complexity analysis:

• To determine the feasibility of an algorithm by estimating an upper bound on the amount of work performed

• To compare different algorithms before deciding on which one to implement

Time complexity expresses the relationship between the size of the input and the run time for the algorithm.

It is usually expressed as a proportionality,rather than an exact function.

To simplify analysis, we sometimes ignore work that takes a constant amount of time, independent of the problem input size. When comparing two algorithms that perform the same task, we often just concentrate on the differences between algorithms. Simplified analysis can be based on:

• Number of arithmetic operations performed

• Number of comparisons made

• Number of times through a critical loop

• Number of array elements accessed

• etc

Example: (for Number of arithmetic operations performed)

Brute force method:
p(x) = 4*x*x*x*x + 7*x*x*x – 2*x*x + 3*x + 6
Horner’s method:
p(x) = (((4*x + 7) * x – 2) * x + 3) * x + 6

We’ll examine the general form of a polynomial of degree n, and express our result in terms of n.

We’ll look at the worst case (max number of multiplications) to get an upper bound on the work.

Analysis for Brute Force Method:

p(x) = a_n * x * x * … * x * x + n multiplications
a_n-1 * x * x * … * x * x + n-1 multiplications
a_n-2 * x * x * … * x * x + n-2 multiplications … + … a_2b* x * x + 2 multiplications a_1 * x + 1 multiplication a_0

Number of multiplications needed in the worst case is:

T(n) = n + n-1 + n-2 + … + 3 + 2 + 1

= n(n + 1)/2 (result from high school math **)

= n^2/2 + n/2

This is an exact formula for the maximum number of multiplications.

In general though, analyses yield upper bounds rather than exact formula.

We say that the number of multiplications is on the order of n^2, or O(n^2).

(Think of this as being proportional to n^2.)

Analysis for Horner’s Method:

p(x) = ( … ((( a_n* x + 1 multiplication

a_n-1) * x + 1 multiplication

a_n-2) * x + 1 multiplication

… + multiply run for times

a_2) * x + 1 multiplication

a_1) * x + 1 multiplication

a_0

T(n) = n, so the number of multiplications is O(n)

(Continued...)

The running time of an algorithm depends on two things:

(i) the particular input it's given, and (ii) the random choices made while running.

There are multiple possible "time complexities", depending on whether we take the average or worst-case of each.

Time Complexity:

Worst case analysis is used to find an upper bound on algorithm performance for large problems (large n).

When we say that an algorithm runs in time T(n), we mean that T(n) is an upper bound on the running time that holds for all inputs of size n. This is called worst-case analysis.

Average Time Complexity addresses this shortcoming, but potentially goes too far in the other direction. By averaging over all inputs equally, it makes a strong assumption about how the user will behave. Any statement about average complexity requires you to assume a distribution over inputs -- use the wrong distribution, and your bound is worthless.

Expected Time Complexity is the most commonly used metric, and it sits in between the other two. It avoids the problems with Average-Time Complexity by using the worst-case input. It avoid the problems of Worst-Case Complexity by averaging over internal choices. In other words, it assumes "typical" behavior when we can actually be certain what "typical" means, and uses worst-case behavior when we can't.

Worst case Running Time: The behavior of the algorithm with respect to the worst possible case of the input instance. The worst-case running time of an algorithm is an upper bound on the running time for any input. Knowing it gives us a guarantee that the algorithm will never take any longer. There is no need to make an educated guess about the running time.

Average case Running Time: The expected behavior when the input is randomly drawn from a given distribution. The average-case running time of an algorithm is an estimate of the running time for an "average" input. Computation of average-case running time entails knowing all possible input sequences, the probability distribution of occurrence of these sequences, and the running times for the individual sequences. Often it is assumed that all inputs of a given size are equally likely.(* average case is not expected case)

Amortized Running Time: Here the time required to perform a sequence of (related) operations is averaged over all the operations performed. Amortized analysis can be used to show that the average cost of an operation is small, if one averages over a sequence of operations, even though a simple operation might be expensive. Amortized analysis guarantees the average performance of each operation in the worst case.

Constant Time Complexity: Algorithms whose solutions are independent of the size of the inputs are said to have constant time complexity. Constant time complexity is denoted as O(1).

(Continued...)

Asymptotic Analysis

When analyzing the running time or space usage of programs, we usually try to estimate the time or space as function of the input size. For example, when analyzing the worst case running time of a function that sorts a list of numbers, we will be concerned with how long it takes as a function of the length of the input list. For example, we say the standard insertion sort takes timeT(n) where T(n)= c*n²+k for some constants c and k. In contrast, merge sort takes time T ′(n)= c′*n*log₂(n) + k′.

The asymptotic behavior of a function f(n) (such as f(n)=c*n or f(n)=c*n², etc.) refers to the growth of f(n) as n gets large. We typically ignore small values of n, since we are usually interested in estimating how slow the program will be on large inputs. A good rule of thumb is: the slower the asymptotic growth rate, the better the algorithm (although this is often not the whole story).

By this measure, a linear algorithm (i.e., f(n)=d*n+k) is always asymptotically better than a quadratic one (e.g., f(n)=c*n²+q). That is because for any given (positive) c,k,d, and q there is always some n at which the magnitude of c*n²+q overtakes d*n+k. For moderate values of n, the quadratic algorithm could very well take less time than the linear one, for example if c is significantly smaller than d and/or k is significantly smaller than q. However, the linear algorithm will always be better for sufficiently large inputs. Remember to THINK BIG when working with asymptotic rates of growth.

Worst-Case and Average-Case Analysis

When we say that an algorithm runs in time T(n), and we mean that T(n) is an upper bound on the running time that holds for all inputs of size n. This is called worst-case analysis. The algorithm may very well take less time on some inputs of size n, but it doesn't matter. If an algorithm takes T(n)=c*n²+k steps on only a single input of each size n and only n steps on the rest, we still say that it is a quadratic algorithm.

A popular alternative to worst-case analysis is average-case analysis. Here we do not bound the worst case running time, but try to calculate the expected time spent on a randomly chosen input. This kind of analysis is generally harder, since it involves probabilistic arguments and often requires assumptions about the distribution of inputs that may be difficult to justify. On the other hand, it can be more useful because sometimes the worst-case behavior of an algorithm is misleadingly bad. A good example of this is the popular quicksort algorithm, whose worst-case running time on an input sequence of length n is proportional to n² but whose expected running time is proportional to n log n.

Order of Growth and Big-O Notation

In estimating the running time of insert_sort (or any other program) we don't know what the constants c or k are. We know that it is a constant of moderate size, but other than that it is not important; we have enough evidence from the asymptotic analysis to know that a merge_sort(see below) is faster than the quadratic insert_sort, even though the constants may differ somewhat. (This does not always hold; the constants can sometimes make a difference, but in general it is a very good rule of thumb.)

We may not even be able to measure the constant c directly. For example, we may know that a given expression of the language, such as if, takes a constant number of machine instructions, but we may not know exactly how many. Moreover, the same sequence of instructions executed on a Pentium IV will take less time than on a Pentium II (although the difference will be roughly a constant factor). So these estimates are usually only accurate up to a constant factor anyway. For these reasons, we usually ignore constant factors in comparing asymptotic running times.

Computer scientists have developed a convenient notation for hiding the constant factor. We write O(n) (read: ''order n'') instead of ''cn for some constant c.'' Thus an algorithm is said to beO(n) or linear time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at most cn on inputs of size n. An algorithm is said to be O(n²) or quadratic time if there is a fixed constant c such that for all sufficiently large n, the algorithm takes time at mostcn² on inputs of size n. O(1) means constant time.

Polynomial time means n^O(1), or n^c for some constant c. Thus any constant, linear, quadratic, or cubic (O(n³)) time algorithm is a polynomial-time algorithm.

This is called big-O notation. It concisely captures the important differences in the asymptotic growth rates of functions.

One important advantage of big-O notation is that it makes algorithms much easier to analyze, since we can conveniently ignore low-order terms. For example, an algorithm that runs in time

10n³ + 24n² + 3n log n + 144

is still a cubic algorithm, since

10n³ + 24n² + 3n log n + 144
<= 10n³ + 24n³ + 3n³ + 144n³
<= (10 + 24 + 3 + 144)n³
= O(n³).

Of course, since we are ignoring constant factors, any two linear algorithms will be considered equally good by this measure. There may even be some situations in which the constant is so huge in a linear algorithm that even an exponential algorithm with a small constant may be preferable in practice. This is a valid criticism of asymptotic analysis and big-O notation. However, as a rule of thumb it has served us well. Just be aware that it is only a rule of thumb--the asymptotically optimal algorithm is not necessarily the best one.

Some common orders of growth seen often in complexity analysis are

O(1)	constant
O(log n)	logarithmic
O(n)	linear
O(n log n)	"n log n"
O(n²)	quadratic
O(n³)	cubic

Here log means log₂ or the logarithm base 2, although the logarithm base doesn't really matter since logarithms with different bases differ by a constant factor. Note also that 2^O(n) and O(2ⁿ)are not the same!

Comparing Orders of Growth

O: Let f and g be functions from positive integers to positive integers. We say f is O(g(n)) (read: ''f is order g'') if g is an upper bound on f: there exists a fixed constant c and a fixed n₀ such that for all n≥n₀,

f(n) ≤ cg(n).

Equivalently, f is O(g(n)) if the function f(n)/g(n) is bounded above by some constant.
o: We say f is o(g(n)) (read: "f is little-o of g'') if for all arbitrarily small real c > 0, for all but perhaps finitely many n,

f(n) ≤ cg(n).

Equivalently, f is o(g) if the function f(n)/g(n) tends to 0 as n tends to infinity. That is, f is small compared to g. If f is o(g) then f is also O(g)
Ω: We say that f is Ω(g(n)) (read: "f is omega of g") if g is a lower bound on f for large n.Formally, f is Ω(g) if there is a fixed constant c and a fixed n₀ such that for all n>n₀,

cg(n) ≤ f(n)

For example, any polynomial whose highest exponent is n^k is Ω(n^k). If f(n) is Ω(g(n)) theng(n) is O(f(n)). If f(n) is o(g(n)) then f(n) is not Ω(g(n)).
Θ: We say that f is Θ(g(n)) (read: "f is theta of g") if g is an accurate characterization of f for large n: it can be scaled so it is both an upper and a lower bound of f. That is, f is both O(g(n)) and Ω(g(n)). Expanding out the definitions of Ω and O, f is Θ(g(n)) if there are fixed constants c₁ and c₂ and a fixed n₀ such that for all n>n₀,

c₁g(n) ≤ f(n) ≤ c₂ g(n)

For example, any polynomial whose highest exponent is n^k is Θ(n^k). If f is Θ(g), then it isO(g) but not o(g). Sometimes people use O(g(n)) a bit informally to mean the stronger property Θ(g(n)); however, the two are different.

Here are some examples:

n + log n is O(n) and Q(n), because for all n > 1, n < n + log n < 2n.
n¹⁰⁰⁰ is o(2ⁿ), because n¹⁰⁰⁰/2ⁿ tends to 0 as n tends to infinity.
For any fixed but arbitrarily small real number c, n log n is o(n^1+c), since n log n / n^1+c tends to 0. To see this, take the logarithm
log(n log n / n^1+c)
= log(n log n) - log(n^1+c)
= log n + log log n - (1+c)log n
= log log n - c log n

and observe that it tends to negative infinity.

The meaning of an expression like O(n²) is really a set of functions: all the functions that are O(n²). When we say that f(n) is O(n²), we mean that f(n) is a member of this set. It is also common to write this as f(n) = O(g(n)) although it is not really an equality.

We now introduce some convenient rules for manipulating expressions involving order notation. These rules, which we state without proof, are useful for working with orders of growth. They are really statements about sets of functions. For example, we can read #2 as saying that the product of any two functions in O(f(n)) and O(g(n)) is in O(f(n)g(n)).

cn^m = O(n^k) for any constant c and any m ≤ k.
O(f(n)) + O(g(n)) = O(f(n) + g(n)).
O(f(n))O(g(n)) = O(f(n)g(n)).
O(cf(n)) = O(f(n)) for any constant c.
c is O(1) for any constant c.
log_bn = O(log n) for any base b.

All of these rules (except #1) also hold for Q as well.

1.
For example, consider the problem of finding the minimum element in a list of elements.
Worst case = O(n)
Average case = O(n)
2.
Quick sort
Worst case = O(n2)
Average case = O(n log n)
3.
Merge Sort, Heap Sort
Worst case = O(n log n)
Average case = O(n log n)
4.
Bubble sort
Worst case = O(n2)
Average case = O(n2)
5.
Binary Search Tree: Search for an element
Worst case = O(n)
Average case = O(log n)

ref:

1. http://www.quora.com/What-is-the-difference-between-Average-Time-Complexity-and-Expected-Time-Complexity
2. http://www.csd.uwo.ca/courses/CS1037a/notes/topic13_AnalysisOfAlgs.pdf
3. http://www.cs.cornell.edu/courses/cs3110/2012sp/lectures/lec19-asymp/review.html

細說傾聽。技安聚星堂

標籤

2013年4月20日星期六

complexity analysis

Asymptotic Analysis

Worst-Case and Average-Case Analysis

Order of Growth and Big-O Notation

Comparing Orders of Growth

沒有留言:

張貼留言

標籤

2013年4月20日 星期六

complexity analysis

Asymptotic Analysis

Worst-Case and Average-Case Analysis

Order of Growth and Big-O Notation

Comparing Orders of Growth

沒有留言:

張貼留言

2013年4月20日星期六