ProgrammingMethodology-Lecture23.pdf

Programming Methodology-Lecture23

Instructor (Mehran Sahami): So welcome to the 106A lecture just before

Thanksgiving. I thought attendance might not be spectacular today. People are still

filtering in, but it’s good to see so many of you here, though. I’m sure some of you have

already headed home.

You may be wondering who I am. I would have thought before we got back mid-quarter

evaluations that you stood a chance of recognizing me as the TA of your class, but the

comment of more than half of the people who responded to the question of how is Ben

doing was, “I don’t know Ben. I’ve never interacted with Ben. So I assume he’s doing a

great job.”

Most of you jumped to that conclusion. But – so I sort of chased at this, but not much. I

realize, though, that I would enjoy the opportunity to develop a little more exposure. And

so Maron and I decided to switch roles for today. In fact, he’s making copies as we speak,

and there’s going to be one handout that’s sort of pertinent to what you’ll be doing for

[inaudible] the next assignment. And he should be back any time, but I certainly can’t

blame him for being – running a little late since I was supposed to make the copies before

class myself. So anyway, in my capacity as Maron, I don’t hope to be as spontaneous or

as funny as he is, but I do hope to talk about something that is a little bit off of the sort of

charted path of CS106A’s curriculum.

So one way of describing the things that you’re learning in 106A is that you’re learning

how to take the ways that you already know of doing things that may take you some time

and maybe error-prone in the doing of them and tell a computer precisely how to do them

so that you reap the benefits of a machine being able to execute those instructions. And

they’ll be faster. There’ll be fewer errors as long as you get the instructions right in the

first place. And that’s got to be an exhilarating feeling. It’s got to be empowering, or at

least it was when I was sitting in your place as a freshman, to know that, all of a sudden,

if you have a good idea – if you can come up with a procedure for solving a problem that

you’ve always had and tell a computer how to do it, then the computer will be able to do

that over and over and over again. All right? So that’s 106A in a nutshell. So it’s sort of –

it’s not a perfect metaphor. What does it mean to run breakout by hands? I don’t know.

So what I – the distinction I’m trying to make between what you’ve been learning so far

and what you will learn, hopefully, in this lecture – and certainly if you go on to CS106B

– is that there are things that computers can do that you couldn’t possibly have simulated

by hand. Computers can handle so much more data than you could ever manage to sort

through on your own manually that it’s worth you’re learning a bit about how to make

them do that – how to think about instructing a computer to do something that you

couldn’t possibly have done. So today, we’re going to talk about two of the most

important kinds of those problems in computer science, and I’m going to stop just talking

at you and show you some examples in a second. But those two topics are searching and

sorting. How do you find something in a set of things that you have? And how, if you

could impose some structure on that set, would you both find it more quickly, and – I

guess I got ahead of myself – how would you go about imposing the sort of structure that

would help you find it more quickly?

All right. So if I’m seeming to ramble, that stops now.

So searching and sorting – I say they are important, and I hope you believe me, and you

think that for that reason, it’s worth talking about for an entire lecture. But they really are

just a way of getting to talk about something a little deeper, which is this concept of

algorithmic efficiency that we haven’t said much about in the class so far. So that’s the

third part of this two-part lecture. What’s the deal with searching? Well, searching

doesn’t quite fit the mold as something you couldn’t do by hand. You find things all the

time. Chapter 12 of the book looks at two operations of the searching and sorting. This is

pretty generic. All right. So searching is simpler. You can define a search problem – say

you have an array or some other collection of things, and you have something you want

to find. The typical way this is done is that you want to find the index into the array –

where that element was – or, in a more generic case, you want to find out if that element

is even in the array. So you want to have some way of determining that it wasn’t actually

found.

And that may be all you care about. It may be the case that, if the search routine returns

anything other than negative one, you don’t actually care what its value was. But we

adopt – I will adopt the convention for this lecture that if you don’t find what you’re

searching for, then the method that you wrote to do the search should return a negative

value since that’s not a valid index into the array. All right. So if you have a set of things

that you’re trying to search through, the easiest way of doing that is just to look at all of

them and see if each one, in turn, is what you’re looking for. That’s called a linear search

because, to sort of pre-figure what we’re going to talk about at the end of the lecture, the

number of operations you have to do to find the element that you’re looking for is

linearly proportional to the number of elements that you had to begin with. Now it may

not be obvious right now that there’s something better that you can do, and with an

arbitrary set of data, there is not anything better that you can do. But this procedure that

I’ve written up here is an implementation of what you already could have written – a

procedure for finding an element in an array of integers.

So there’s nothing very tricky about this. If it had been a string, you might have even

avoided having to write the function and called it something like, “Index of” with a

character. And inside of “Index of”, though you don’t have to write the code, it would

have had a four loop that iterated over each of the characters of the string, tested each one

for quality with the key that you were looking for and then, presuming it found one that

equaled that character – or in this case, that integer – it would return the index of it. And

if the four loop completes, then you must not have examined any elements that matched,

and so you return negative one. Okay? So here’s a simulation of how that would work,

although the – leaving nothing to the imagination here, this is pretty easy to sort of

envision.

So we have this array of primes, and it has 17 in it. So we are going to expect to find that

this linear search shouldn’t return negative one. But the second one, we’re looking for 27

– which sort of looks prime, but of course, it’s not because it’s nine times three. All right?

Okay. So we called linear search, and here’s our new stat frame, which has the local

variables I and the parameters P and array in it. This is sort of Eric Robert’s – who wrote

the textbooks – way of displaying the execution model of JAVA. All right? So we’re just

looping over it, and we’re testing as I equals zero and then one and then two and then

three and four, five, six – where should we find it? Well, we were looking for seven – no.

Right now, we’re looking for 27. So we’re just gonna go to the – I think there was a

problem with the way that – well, anyway. The console here prints out that, indeed, we

found 17 at position six. There may have been a problem converting PowerPoint slides to

Keynote slides, so I apologize for the sort of shakiness of these animations. But the

content of the slides should be adequate. Cool.

All right. So now we’re in our second called linear search, and the only thing that’s

changed at this point from the beginning of the first called was that the key is different.

So I is still gonna start at zero, and it’s still gonna iterate over all of the positions of the

array. But we’re gonna go through the entire array, which means going all the way up to

ten where I is no longer less than ten. And we didn’t find 27 anywhere. So I hope this is

all fairly clear. But these are the results that you would have expected. So we found the

position of 17, and then we looked for 27 but didn’t find it and returned negative one.

Cool.

How many people think they could have written that in their sleep? That that was too

much time to spend on such a simple idea? All right. Well, it gets more interesting from

here on out, but – so talk to me. What is the problem with linear search?

Student: It takes a lot of time.

Instructor (Mehran Sahami): It takes a lot of time, right? It takes as much time as you

have elements. So if I asked a question, which was – if I knew the SUID number of one

of you in the room, and I asked, “Is this your SUID number?

And you say, “No.”

And I ask you, “Is 05179164 your SUID number?”

Is it yours? Be kind. Tell me it’s yours.

Student: [Inaudible].

Instructor (Mehran Sahami): It’s not. So it actually turns out that it’s mine, so I would

have had to search the entire room before I finally got back to myself if I didn’t make that

shortcut. Okay.

So another illustration of what is really a very simple idea – that if you’re forced to look

at all of the elements that you are searching among, then that’s bad in and of itself if you

could do better. So we’ll see if we can do better, but here is a rather larger data set – 286

area codes – the three-digit numbers that come before the rest of the seven digits in your

phone numbers.

But we’re trying to find 650, which is this county’s area code. So here’s a larger array.

What would happen if we were searching in this? Well, it depends. If we have a fast

computer, it may not matter. These are – there’s only about 300 numbers here, and doing

something 300 times is something that computers are reasonably good at.

But I thought I’d put together this sort of compelling example of why this can become a

bad idea.

All right. So I wrote a program in JAVA, which has this implementation of linear search

that we were just looking at. And the difference that you may notice is that I have an

instance variable, which is sort of behaving as though it were an array, and you’ll see

why that is in a second. But all that I need it for is to get values from it at certain

positions, and I wanna know what its length is so that I can iterate over it.

And I’m also – so why is it called DIS? It’s short for display, and it’s this – an instance of

this number display class that I’ve written. And if there’s time and interest, I’ll talk about

how that was implemented. But it’s intended to sort of mimic the behavior of an array

and, at the same time, show you a linear search in action.

All right. So this is a dialogue program that’s just reading line to ask me for some input.

And here are all of those numbers.

So I slowed it down – even if this looks like it’s going relatively fast – so that you could

see it sequentially considering each of these numbers. All right?

You may notice that you can – you could probably beat this. Right? It’s going slowly

enough that, since these numbers are already in sorted order, you can jump ahead.

So what is it – I want you to think about – while you’re waiting on this to finish, I want

you to think about what it is that you’re doing when you search for the number in this list

of numbers that happens to be sorted, which allows you to go so much more quickly than

the computer. All right.

So why didn’t we find 650 here?

Student: [Inaudible].

Instructor (Mehran Sahami): It wasn’t even there? Yeah. Tricky. Let’s fix that.

Where do we wanna stick it? Probably between 630 – they skipped whole swats of – I

just pulled this off of the Internet, which is notoriously unreliable. All right. They

probably aren’t gonna let this run all the way through the next time. But 650, as you will

notice, is now right here. So it would eventually be found. I’ll let that tell me when it’s

finished.

So now I’m going to switch over to a different kind of search, which I hope has

something to do with the way a human looks at a sorted array and finds the element that

they’re looking for quickly. But – no. Help me out before I just show you that

implementation. What is the insight that lets you find elements more quickly if there’s

some structure – if there’s a sorted order to the elements that you’re looking for?

Go ahead.

Student: [Inaudible].

Instructor (Mehran Sahami): Okay. So if I start with the first number in the array, and I

ask myself, “Is the number I’m looking for higher or lower than the first number in the

array?”

What’s the answer probably going to be?

Student: [Inaudible].

Instructor (Mehran Sahami): It’s probably higher if it’s a sorted array, right?

So I know that’s not quite what you were getting at, but I’m pushing on the definition of

this procedure so that you – would you like to add a clarification to what you were

intending?

Student: [Inaudible].

Instructor (Mehran Sahami): Yeah. The number in the middle. Okay?

So if you look at the number in the middle, and you see that that number is less than the

number you’re looking for, there’s a whole set of numbers that you now don’t have to

worry about. Right? It’s all of those numbers up to and including the number in the

middle of the array. Right?

So now you’re looking at half of the array, and in some sense, your problem is simpler. In

fact, it’s a lot simpler because if you apply the same procedure again, then you can cut

that remaining half in half, and you can cut the remaining half in half. Right?

So this kind of searching is called binary searching because you make a binary choice at

each decision point, and you can then cut the space that you’re searching in half. So what

would that look like if we wrote it out in code?

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: