>_Programming podcast with Minko Gechev

Episode #2 Binary Search

Today we'll discuss the binary search algorithm. We'll go through several examples when the algorithm makes sense and discuss in detail how it works. By the end of the episode, you'll know when to apply a binary search and know why it's more efficient than a linear search.

In this episode we covered how binary search works and when we should consider it in practice. You can find links for further reading below:

Transcript

welcome to the programming podcast. Here you can learn about computers in the brief and accessible way I'm your host. I mean could get lawyer one in this episode of the podcast today. We are going to talk about searching. We are going to talk about looking for a particular element in a data structure. I bet This is something that we have today. Right we really have to find out whether a certain element exists in an array or a hash map or in a linked list the tree and whatever since the focus of the episode. Today's going to be buying the research. We are going to particularly talk about race. Why did I pick a race well? A race or a very common data structure we often have to traverse on the Ray in order to find out whether certain elements endure exists and in general race. They're unique in two ways first of all. We have order in the race in data structures such as sets vets or a hash map. We don't really have an order. Their inorder sets by default in the race. However we have an order of the elements we have the first second third and so on and so forth on the second of Oh arrays are represented as sequential memory cells and each element went the? IRA is usually the same type. For example in Javascript we may have an array with an element a reference number but in general they're represented as a reference to these individual objects so each element of the array is just a reference to somewhere in the memory. So I sat there ordered and on top of that we have direct access to the individual elements since they are represented as sequential sales in memory. This means that just by having having the index of the particular element that we're looking for and also knowing how big is the cell size or how big how much memory and individual element nineteen hours consumes. We can directly go to a particular element that her interested in so we have constant access or very fast access to a particular element and in the IRA so these special preconditions are crucial for binary search. which is the focus of the episode today by the research search relies on the fact that the elements in the array that we're looking at Searching at the elements are sort their asserted based on certain criteria that we have followed for example we can think of the IRA as sorted list of numbers. We we have one two three four and so on and so forth in disarray. We may start looking for a particular number. Let's say the number and there are two main strategies. This is that actually there are many different stretch strategies that we can approach this problem. For example we can start traversing the data structure from the beginning. We can look for the first element of how Ari and compared to. Its the adamant that we're looking for. We can look for Lou astronaut to move to the second element thirteen element for settlement and Sonal Self-worth until we eventually find the index of the element. We're looking for or we figure out that this adamant does not exist. India with binary search. We can perform a much more efficient searching to nick. Only because the elements of the array are sorted and we know that there is no point looking at all of them before we go to the actual algorithm. I wanted to share a couple of examples examples when I had to do to implement binary search myself because often when we look at Algorithms. It doesn't often seem necessary for us to implement any of them. We are very frequent looking at a very small data structures we are traversing an array with. Let's say like ten or even one hundred or even thousands. Thousands elements and optimizing algorithm which operates on such a small input is very rarely required. It needs to be an extremely performance critical system in order for us to implement an efficient algorithm when we work with no that much data however ever in the past I have had to implement binary search for a couple of particular reasons so I I was working a company which was developing different widgets. So we wanted to optimize our data grit a lot in particular. We wanted to optimize the searching functionality. In the day the Grit we knew that oldest sales in the data grid are sorted so eventually when we have millions of items in the grit we didn't want our implementation of the searching shing functionality to get the bottleneck so we went ahead and we implemented binary search.

We just started looking at this. The elements in a particular way we just follow the binary search algorithm and we optimized searching functionality. Dramatically another example is when I was working on the learning thank platform Ryan Dot Com. Their use case was slightly more complicated. I was working on recording where you were able to record the screen of a virtual machine that is running inside of the browser and on top of that. We're also recording the different user events. For instance one user moving their cursor cursor we were recording wear the cursor is on the screen we are also recording which keys the user was pressing a while interacting with the virtual machine. So this way we had different events these events were stored in an array and for each individual event. We had the type of event what actually happens and a time stamp associated with it. Since we're recording these events sequentially. They were ordered ordered based on the time stamp. Now we had two streams we had the video stream of the virtual machine and we also have this events three of course yes we also had some extreme but it is not interesting for this particular example that will be looking at so once the user recourse this. We were saving this somewhere in Google cloud or aid that wes. It doesn't really matter once so we'll start looking at the media they were. You're supposed to see the video and also and also see some cassation of of the of the events for example when the recording was yes Capturing interaction of the author of the recording with a particular element with their mouths we were visualizing leising demolished events in some special way. Because because we wanted to make them be more noticeable and more intuitive for actual consumer of the recording so once the consumer of the recording navigates to a particular time in the video. We wanted to find out which is the a particular mouse or keyboard events ASSOC- associated with this time stamp and now imagine a long recording a long recording. Kim consist list of maybe hundreds or even thousands of events or even menial servants. Now imagine you as a consumer of the video you navigates to particular Ornstein. Veto you click there and well if we're using If we were using a exhaustive search or leaner leaner search. We had to start looking at the individual events one by one. We're going to start from the beginning in to find out whether this is the closest event onto the time stamp that you navigated to Ashrat. We had to navigate to the second element third element and so on and so forth until we eventually reach the closest closest events to the time that you went to now in the large video. This could be terrible in John Scripts in general. We we have about when running in the browser. We have about sixty seconds to perform an operation to perform computation because otherwise alternatively the browser is going to start dropping frames and drop the user experience in general making. We're going to make it worse from sixty percents. We're going to directly drop to thirty frames per seconds or less now if you have to perform an exhaustive for dinner searching on top of one hundred million items we'll have to perform one hundred million comparisons in the worst case because the item that we're looking at the item that we're actually looking for. It could be somewhere in the end of this list however her if we were using binary search we can drop the number of comparisons to log in which is only twenty seven comparisons imagine engine. We're dropping from a hundred million comparisons to twenty seven. This is an extremely powerful algorithm. So how does it work in research. Actually it has a simple idea we can think about it as looking for full number or dress address book. We know that the names in the address book there are ordered by the names of the people inside back within that we have their addresses. So if you're looking for a particular person we can. I look at the middle of the address book and depending on whether and depending on whether the names in the middle are Bir likely graphically or smaller again.

Lexi Lexi graphically compared to the name that we are looking for. We can decide whether we want to start doing the same in the first half of the address book doc or in the second half for example if we're looking for a person which WHO's name starts with E. I we're going to take a look at the middle. We're going to to figure out that well probably. That's not the place where we're looking at because all the names the middle are with a letter which is much bigger than e graphic traffic. So we're going to ignore the second half of the address book and we're going to entirely focused on the first half of the address book so we're going to look at the half exactly exactly the middle of the first half and we're going to figure out that well that's probably not the place to look for dispersal whose name star Suite E so we're going going to ignore the second half of this first half of the address book so we're going to ignore pretty much the second quarter alter and we're going to focus on the first quarter. See How on each step. We're ignoring half of the input. This is extremely powerful technique. Because you'll see drops the number of operations that were performing while searching dramatic so this is why finery search is much more efficient compared to a linear exhaustive search but how much more efficient. This is a very interesting question to ask in fact because it relates to analyzing the complexity of this algorithm by complexity I mean the runtime performance of the Algorithm. There is a very solid foundation for analyzing Complexity of particular algorithm by looking at how algorithm is going to scale over time when we increase increase the input to understand better. Let's actually look at the particular implementation of binary search so we have our rate which has n elements. We also have a particular element that we're looking for. In general we want to find whether element x exists in the IRA and what's index. So first we're going to take our array and look in the Middle Adamant we'll get to compare X.. With the element of the middle and if x equals to this element which is going to return the index of the Middle Element which is perfect. We already found the tremendous. When does we're looking for alternatively if the element of the middle is bigger compared to the elements X? We're going to completely ignore the second in half of the IRA. Because we know that all the elements in the second half of the array. They're bigger than the first settlement. Remember one of the preconditions for binary search coach was our array to be sorted as the next step. We're going to look at the first half. We're going to find the middle element in the first half and we're going to compare it with X.. If this element in the first half the middle one equals X.. Them well we found the exact no matter who looking for so. We're just returning cynics. Alternatively if it is bigger compared to x then we're going to ignore the second a quarter and we're going to focus on the first quarter now see how we represent binary search with this decision. Trie we I have the decision to figure out whether adamant equals whether x equals the Middle Element or alternatively we look at the first half or the second half the afterwards if we go to the first branch look either at the first quarter or the second quarter if we go to the second half look at the third quarter or fourth quarter this way we actually truly have a binary a balanced binary tree and finding whether our element existing there or not you just a pass from the rules of the of this three to any of the leaf notes so this is actually fascinating but the decision tree that we're going through during the execution of binary search you just another another data structure that is pretty common. boggles binary tree the height of such panels binary is look and by local I mean logarithms. With based to from end since well we're not calculating logarithms. Didi and I don't do that. While I wanNA vacation or like walking in the park let's make a very quick recap of lot walk and logarithms Based to avenge his rhythm of with base to from sixteen is going to be poor. This means that we need to take to and put it on on the power four in order to get sixteen so if we have luxury him from one hundred million with base to we need to figure out on which power we need to put to in order to get one hundred million and this is not much it is bright beautiful twenty seven now you can see how fuhr how much fewer operations in binary search were performing compared to appear linear exhaustive search.