ProgrammingMethodology-Lecture23.pdf

(47 KB) Pobierz
Programming Methodology-Lecture23
Instructor (Mehran Sahami): So welcome to the 106A lecture just before
Thanksgiving. I thought attendance might not be spectacular today. People are still
filtering in, but it’s good to see so many of you here, though. I’m sure some of you have
already headed home.
You may be wondering who I am. I would have thought before we got back mid-quarter
evaluations that you stood a chance of recognizing me as the TA of your class, but the
comment of more than half of the people who responded to the question of how is Ben
doing was, “I don’t know Ben. I’ve never interacted with Ben. So I assume he’s doing a
great job.”
Most of you jumped to that conclusion. But – so I sort of chased at this, but not much. I
realize, though, that I would enjoy the opportunity to develop a little more exposure. And
so Maron and I decided to switch roles for today. In fact, he’s making copies as we speak,
and there’s going to be one handout that’s sort of pertinent to what you’ll be doing for
[inaudible] the next assignment. And he should be back any time, but I certainly can’t
blame him for being – running a little late since I was supposed to make the copies before
class myself. So anyway, in my capacity as Maron, I don’t hope to be as spontaneous or
as funny as he is, but I do hope to talk about something that is a little bit off of the sort of
charted path of CS106A’s curriculum.
So one way of describing the things that you’re learning in 106A is that you’re learning
how to take the ways that you already know of doing things that may take you some time
and maybe error-prone in the doing of them and tell a computer precisely how to do them
so that you reap the benefits of a machine being able to execute those instructions. And
they’ll be faster. There’ll be fewer errors as long as you get the instructions right in the
first place. And that’s got to be an exhilarating feeling. It’s got to be empowering, or at
least it was when I was sitting in your place as a freshman, to know that, all of a sudden,
if you have a good idea – if you can come up with a procedure for solving a problem that
you’ve always had and tell a computer how to do it, then the computer will be able to do
that over and over and over again. All right? So that’s 106A in a nutshell. So it’s sort of –
it’s not a perfect metaphor. What does it mean to run breakout by hands? I don’t know.
So what I – the distinction I’m trying to make between what you’ve been learning so far
and what you will learn, hopefully, in this lecture – and certainly if you go on to CS106B
– is that there are things that computers can do that you couldn’t possibly have simulated
by hand. Computers can handle so much more data than you could ever manage to sort
through on your own manually that it’s worth you’re learning a bit about how to make
them do that – how to think about instructing a computer to do something that you
couldn’t possibly have done. So today, we’re going to talk about two of the most
important kinds of those problems in computer science, and I’m going to stop just talking
at you and show you some examples in a second. But those two topics are searching and
sorting. How do you find something in a set of things that you have? And how, if you
could impose some structure on that set, would you both find it more quickly, and – I
guess I got ahead of myself – how would you go about imposing the sort of structure that
would help you find it more quickly?
All right. So if I’m seeming to ramble, that stops now.
So searching and sorting – I say they are important, and I hope you believe me, and you
think that for that reason, it’s worth talking about for an entire lecture. But they really are
just a way of getting to talk about something a little deeper, which is this concept of
algorithmic efficiency that we haven’t said much about in the class so far. So that’s the
third part of this two-part lecture. What’s the deal with searching? Well, searching
doesn’t quite fit the mold as something you couldn’t do by hand. You find things all the
time. Chapter 12 of the book looks at two operations of the searching and sorting. This is
pretty generic. All right. So searching is simpler. You can define a search problem – say
you have an array or some other collection of things, and you have something you want
to find. The typical way this is done is that you want to find the index into the array –
where that element was – or, in a more generic case, you want to find out if that element
is even in the array. So you want to have some way of determining that it wasn’t actually
found.
And that may be all you care about. It may be the case that, if the search routine returns
anything other than negative one, you don’t actually care what its value was. But we
adopt – I will adopt the convention for this lecture that if you don’t find what you’re
searching for, then the method that you wrote to do the search should return a negative
value since that’s not a valid index into the array. All right. So if you have a set of things
that you’re trying to search through, the easiest way of doing that is just to look at all of
them and see if each one, in turn, is what you’re looking for. That’s called a linear search
because, to sort of pre-figure what we’re going to talk about at the end of the lecture, the
number of operations you have to do to find the element that you’re looking for is
linearly proportional to the number of elements that you had to begin with. Now it may
not be obvious right now that there’s something better that you can do, and with an
arbitrary set of data, there is not anything better that you can do. But this procedure that
I’ve written up here is an implementation of what you already could have written – a
procedure for finding an element in an array of integers.
So there’s nothing very tricky about this. If it had been a string, you might have even
avoided having to write the function and called it something like, “Index of” with a
character. And inside of “Index of”, though you don’t have to write the code, it would
have had a four loop that iterated over each of the characters of the string, tested each one
for quality with the key that you were looking for and then, presuming it found one that
equaled that character – or in this case, that integer – it would return the index of it. And
if the four loop completes, then you must not have examined any elements that matched,
and so you return negative one. Okay? So here’s a simulation of how that would work,
although the – leaving nothing to the imagination here, this is pretty easy to sort of
envision.
So we have this array of primes, and it has 17 in it. So we are going to expect to find that
this linear search shouldn’t return negative one. But the second one, we’re looking for 27
– which sort of looks prime, but of course, it’s not because it’s nine times three. All right?
Okay. So we called linear search, and here’s our new stat frame, which has the local
variables I and the parameters P and array in it. This is sort of Eric Robert’s – who wrote
the textbooks – way of displaying the execution model of JAVA. All right? So we’re just
looping over it, and we’re testing as I equals zero and then one and then two and then
three and four, five, six – where should we find it? Well, we were looking for seven – no.
Right now, we’re looking for 27. So we’re just gonna go to the – I think there was a
problem with the way that – well, anyway. The console here prints out that, indeed, we
found 17 at position six. There may have been a problem converting PowerPoint slides to
Keynote slides, so I apologize for the sort of shakiness of these animations. But the
content of the slides should be adequate. Cool.
All right. So now we’re in our second called linear search, and the only thing that’s
changed at this point from the beginning of the first called was that the key is different.
So I is still gonna start at zero, and it’s still gonna iterate over all of the positions of the
array. But we’re gonna go through the entire array, which means going all the way up to
ten where I is no longer less than ten. And we didn’t find 27 anywhere. So I hope this is
all fairly clear. But these are the results that you would have expected. So we found the
position of 17, and then we looked for 27 but didn’t find it and returned negative one.
Cool.
How many people think they could have written that in their sleep? That that was too
much time to spend on such a simple idea? All right. Well, it gets more interesting from
here on out, but – so talk to me. What is the problem with linear search?
Student: It takes a lot of time.
Instructor (Mehran Sahami): It takes a lot of time, right? It takes as much time as you
have elements. So if I asked a question, which was – if I knew the SUID number of one
of you in the room, and I asked, “Is this your SUID number?
And you say, “No.”
And I ask you, “Is 05179164 your SUID number?”
Is it yours? Be kind. Tell me it’s yours.
Student: [Inaudible].
Instructor (Mehran Sahami): It’s not. So it actually turns out that it’s mine, so I would
have had to search the entire room before I finally got back to myself if I didn’t make that
shortcut. Okay.
So another illustration of what is really a very simple idea – that if you’re forced to look
at all of the elements that you are searching among, then that’s bad in and of itself if you
could do better. So we’ll see if we can do better, but here is a rather larger data set – 286
area codes – the three-digit numbers that come before the rest of the seven digits in your
phone numbers.
But we’re trying to find 650, which is this county’s area code. So here’s a larger array.
What would happen if we were searching in this? Well, it depends. If we have a fast
computer, it may not matter. These are – there’s only about 300 numbers here, and doing
something 300 times is something that computers are reasonably good at.
But I thought I’d put together this sort of compelling example of why this can become a
bad idea.
All right. So I wrote a program in JAVA, which has this implementation of linear search
that we were just looking at. And the difference that you may notice is that I have an
instance variable, which is sort of behaving as though it were an array, and you’ll see
why that is in a second. But all that I need it for is to get values from it at certain
positions, and I wanna know what its length is so that I can iterate over it.
And I’m also – so why is it called DIS? It’s short for display, and it’s this – an instance of
this number display class that I’ve written. And if there’s time and interest, I’ll talk about
how that was implemented. But it’s intended to sort of mimic the behavior of an array
and, at the same time, show you a linear search in action.
All right. So this is a dialogue program that’s just reading line to ask me for some input.
And here are all of those numbers.
So I slowed it down – even if this looks like it’s going relatively fast – so that you could
see it sequentially considering each of these numbers. All right?
You may notice that you can – you could probably beat this. Right? It’s going slowly
enough that, since these numbers are already in sorted order, you can jump ahead.
So what is it – I want you to think about – while you’re waiting on this to finish, I want
you to think about what it is that you’re doing when you search for the number in this list
of numbers that happens to be sorted, which allows you to go so much more quickly than
the computer. All right.
So why didn’t we find 650 here?
Student: [Inaudible].
Instructor (Mehran Sahami): It wasn’t even there? Yeah. Tricky. Let’s fix that.
Where do we wanna stick it? Probably between 630 – they skipped whole swats of – I
just pulled this off of the Internet, which is notoriously unreliable. All right. They
probably aren’t gonna let this run all the way through the next time. But 650, as you will
notice, is now right here. So it would eventually be found. I’ll let that tell me when it’s
finished.
So now I’m going to switch over to a different kind of search, which I hope has
something to do with the way a human looks at a sorted array and finds the element that
they’re looking for quickly. But – no. Help me out before I just show you that
implementation. What is the insight that lets you find elements more quickly if there’s
some structure – if there’s a sorted order to the elements that you’re looking for?
Go ahead.
Student: [Inaudible].
Instructor (Mehran Sahami): Okay. So if I start with the first number in the array, and I
ask myself, “Is the number I’m looking for higher or lower than the first number in the
array?”
What’s the answer probably going to be?
Student: [Inaudible].
Instructor (Mehran Sahami): It’s probably higher if it’s a sorted array, right?
So I know that’s not quite what you were getting at, but I’m pushing on the definition of
this procedure so that you – would you like to add a clarification to what you were
intending?
Student: [Inaudible].
Instructor (Mehran Sahami): Yeah. The number in the middle. Okay?
So if you look at the number in the middle, and you see that that number is less than the
number you’re looking for, there’s a whole set of numbers that you now don’t have to
worry about. Right? It’s all of those numbers up to and including the number in the
middle of the array. Right?
So now you’re looking at half of the array, and in some sense, your problem is simpler. In
fact, it’s a lot simpler because if you apply the same procedure again, then you can cut
that remaining half in half, and you can cut the remaining half in half. Right?
So this kind of searching is called binary searching because you make a binary choice at
each decision point, and you can then cut the space that you’re searching in half. So what
would that look like if we wrote it out in code?
Zgłoś jeśli naruszono regulamin