2.9. Searching in an Array¶
2.9.1. Sequential Search¶
If you want to find the position in an unsorted array of \(n\) integers that stores a particular value, you cannot really do better than simply looking through the array from the beginning and move toward the end until you find what you are looking for. This algorithm is called sequential search. If you do find it, we call this a successful search. If the value is not in the array, eventually you will reach the end. We will call this an unsuccessful search. Here is a simple implementation for sequential search.
# Return the position of an element in a list.
# If the element is not found, return -1.
def sequentialSearch(elements, e):
for i in range(len(elements)): # For each element
if elements[i] == e: # if we found it
return i # return this position
return -1 # Otherwise, return -1
It is natural to ask how long a program or algorithm will take to run. But we do not really care exactly how long a particular program will run on a particular computer. We just want some sort of estimate that will let us compare one approach to solving a problem with another. This is the basic idea of algorithm analysis. In the case of sequential search, it is easy to see that if the value is in position \(i\) of the array, then sequential search will look at \(i\) values to find it. If the value is not in the array at all, then we must look at \(n\) values if the array holds \(n\) values. This would be called the worst case for sequential search. Since the amount of work is proportional to \(n\), we say that the worst case for sequential search has linear cost. For this reason, the sequential search algorithm is sometimes called linear search.
2.9.2. Binary Search¶
Sequential search is the best that we can do when trying to find a value in an unsorted array. [#]_ But if the array is sorted in increasing order by value, then we can do much better. We use a process called binary search.
Binary search begins by examining the value in the middle position of the array; call this position \(mid\) and the corresponding value \(k_{mid}\). If \(k_{mid} = K\), then processing can stop immediately. This is unlikely to be the case, however. Fortunately, knowing the middle value provides useful information that can help guide the search process. In particular, if \(k_{mid} > K\), then you know that the value \(K\) cannot appear in the array at any position greater than \(mid\). Thus, you can eliminate future search in the upper half of the array. Conversely, if \(k_{mid} < K\), then you know that you can ignore all positions in the array less than \(mid\). Either way, half of the positions are eliminated from further consideration. Binary search next looks at the middle position in that part of the array where value \(K\) may exist. The value at this position again allows us to eliminate half of the remaining positions from consideration. This process repeats until either the desired value is found, or there are no positions remaining in the array that might contain the value \(K\). Here is an illustration of the binary search method.
With the right math techniques, it is not too hard to show that the cost of binary search on an array of \(n\) values is at most \(\log n\). This is because we are repeatedly splitting the size of the subarray that we must look at in half. We stop (in the worst case) when we reach a subarray of size 1. And we can only cut the value of \(n\) in half \(\log n\) times before we reach 1. [#]_
2.9.3. Analyzing Binary Search¶
Function binarySearch
is designed to find the (single) occurrence of
\(K\) and return its position.
A special value is returned if \(K\) does not appear in the array.
This algorithm can be modified to implement variations
such as returning the position of the first
occurrence of \(K\) in the array if multiple occurrences are
allowed, and returning the position of the greatest value less than
\(K\) when \(K\) is not in the array.
Comparing sequential search to binary search, we see that as \(n\) grows, the \(\Theta(n)\) running time for sequential search in the average and worst cases quickly becomes much greater than the \(\Theta(\log n)\) running time for binary search. Taken in isolation, binary search appears to be much more efficient than sequential search. This is despite the fact that the constant factor for binary search is greater than that for sequential search, because the calculation for the next search position in binary search is more expensive than just incrementing the current position, as sequential search does.
Note however that the running time for sequential search will be roughly the same regardless of whether or not the array values are stored in order. In contrast, binary search requires that the array values be ordered from lowest to highest. Depending on the context in which binary search is to be used, this requirement for a sorted array could be detrimental to the running time of a complete program, because maintaining the values in sorted order requires a greater cost when inserting new elements into the array. This is an example of a tradeoff between the advantage of binary search during search and the disadvantage related to maintaining a sorted array. Only in the context of the complete problem to be solved can we know whether the advantage outweighs the disadvantage.