Phil vs. the Computer


Recommended Posts

I suspect the different ways were carefully tested to avoid giving either Watson or humans an advantage.

My impression, from watching Jennings’ and Rutter’s reactions, is that they weren’t even getting to ring in. Was there a case where one of them did ring in first, got it wrong, then Watson rang in getting it right? Honestly I don’t remember, that might be something to watch for.

DGMS (don't get me started).

Don't get yourself started. It’s a competition, and it’s entertaining just the way it is.

Link to comment
Share on other sites

  • Replies 51
  • Created
  • Last Reply

Top Posters In This Topic

A better test and quiz show based on general knowledge of people (and computers) would not be about button pressing or reaction times at all. I think there was a quiz show from the 50's in which each contestant was in a separate glass booth so he couldn't hear anything said.

Putting contestants in a glass booth is unfair to people who are uncomfortable in small spaces. The proper purpose of a game show should be to reward knowledge and thinking skills, not to punish and reward people for how they react to being confined.

Contestants also shouldn't have to appear on camera or in front of an audience, since it makes some people more nervous than others. They should be allowed to play the game from the comfort of their homes, or however they feel most safe and comfortable. The proper, objective way to run a game show wouldn't be about rewarding extroverts who like to ham it up and be the center of attention. Introverts should be allowed to correspond with game show officials electronically or by mail. If they're too shy to make contact with anyone, they should be allowed to not be observed in any way, and should be trusted not to cheat.

And they had more than an instant (five? ten? seconds) to rack their brains while a bit of background music played. That's the proper way to do it. More about accuracy and knowledge than speed.

The actual "proper way to do it" wouldn't be "more" about accuracy than speed, but completely about accuracy while completely eliminating speed. No time limit is the proper, objective, rational, non-insane way to do it.

That way each of the three contestants gets to give his (or in this case its) answer.

Why do you hate women? Only men and computers are good enough to play games in your "proper" world? Male chauvinist pig!

Points then should be awarded to each right answer, not the fastest, and not only the fastest gets to answer. Fraction of a second is really, really dumb. As is having to worry about pressing too fast. Give me a @#$%^&*## break.

Then any time limit is really dumb.

This is just one symptom of a cognitive disorder, of superficiality. We live in an age where speed is prized over careful reflection, over knowledge, over mulling over: The lightning quiz. Spritzing out bits of attention across the internet. Multitasking. Short attention spans.

Yeah, let's "mull it over" for 5 whole seconds while being locked in a glass booth, you fricking claustrophiliac, short-attention-spanned, woman-hating, center-of-attention wannabe.

J

Link to comment
Share on other sites

A better test and quiz show based on general knowledge of people (and computers) would not be about button pressing or reaction times at all. I think there was a quiz show from the 50's in which each contestant was in a separate glass booth so he couldn't hear anything said or be affected by anyone else's answer. And they had more than an instant (five? ten? seconds) to rack their brains while a bit of background music played.

That's the proper way to do it. More about accuracy and knowledge than speed. That way each of the three contestants gets to give his (or in this case its) answer. Points then should be awarded to each right answer, not the fastest, and not only the fastest gets to answer. Fraction of a second is really, really dumb. As is having to worry about pressing too fast. Give me a @#$%^&*## break.

This is just one symptom of a cognitive disorder, of superficiality. We live in an age where speed is prized over careful reflection, over knowledge, over mulling over: The lightning quiz. Spritzing out bits of attention across the internet. Multitasking. Short attention spans.

DGMS (don't get me started).

No, it is not "cognitive disorder." The speed element reflects the skill of being able to perform under pressure and it adds drama to the show, which is a form of commercial entertainment, not an objective measure of knowledge.

Link to comment
Share on other sites

A better test and quiz show based on general knowledge of people (and computers) would not be about button pressing or reaction times at all. I think there was a quiz show from the 50's in which each contestant was in a separate glass booth so he couldn't hear anything said.

Putting contestants in a glass booth is unfair to people who are uncomfortable in small spaces. The proper purpose of a game show should be to reward knowledge and thinking skills, not to punish and reward people for how they react to being confined.

Contestants also shouldn't have to appear on camera or in front of an audience, since it makes some people more nervous than others. They should be allowed to play the game from the comfort of their homes, or however they feel most safe and comfortable. The proper, objective way to run a game show wouldn't be about rewarding extroverts who like to ham it up and be the center of attention. Introverts should be allowed to correspond with game show officials electronically or by mail. If they're too shy to make contact with anyone, they should be allowed to not be observed in any way, and should be trusted not to cheat.

And they had more than an instant (five? ten? seconds) to rack their brains while a bit of background music played. That's the proper way to do it. More about accuracy and knowledge than speed.

The actual "proper way to do it" wouldn't be "more" about accuracy than speed, but completely about accuracy while completely eliminating speed. No time limit is the proper, objective, rational, non-insane way to do it.

That way each of the three contestants gets to give his (or in this case its) answer.

Why do you hate women? Only men and computers are good enough to play games in your "proper" world? Male chauvinist pig!

Points then should be awarded to each right answer, not the fastest, and not only the fastest gets to answer. Fraction of a second is really, really dumb. As is having to worry about pressing too fast. Give me a @#$%^&*## break.

Then any time limit is really dumb.

This is just one symptom of a cognitive disorder, of superficiality. We live in an age where speed is prized over careful reflection, over knowledge, over mulling over: The lightning quiz. Spritzing out bits of attention across the internet. Multitasking. Short attention spans.

Yeah, let's "mull it over" for 5 whole seconds while being locked in a glass booth, you fricking claustrophiliac, short-attention-spanned, woman-hating, center-of-attention wannabe.

J

Unfair, Jonathan! Phil is just using the Randian "he" to mean person as we all do. I do it here too, it's easier than he/she or pluralizing everything.

Phil does not hate all women as I know for sure he has a healthy respectful interest in beautiful mature actresses

Link to comment
Share on other sites

I have only one response to the last three posters:

Hahahahahaha. Hahaha. Ha ha. Ha.

Since there was a time limit, I've had to abbreviate my response. Plus I have to get ready for tonite's competition by trying to catch a dollar bill dropped between them by clapping my two palms together.

Very dramatic and worth watching.

Edited by Philip Coates
Link to comment
Share on other sites

How quickly a contestant presses the button is clearly a factor. It obviously is for humans. Suppose two people both think of the answer, but one who thinks of it before the clue is completely given and the second thinks of it a few nanoseconds after the clue is completely given. The first has the speed advantage.

A lot of the questions seemingly could have been solved by a search engine.

In post #7 I gave a link to a tv show about Watson. During that show it was said that Watson had Wikipedia and previous Jeopardy shows in its memory. A search engine is a component of Watson, but there is more to it.

Did anybody else see David Ferrucci's face after Watson completely blew the Final Jeopardy answer/question? But since the category was U.S. cities, I guess "Toronto" probably wouldn't elicit the happiest response from him.

Watson's answer to the Final Jeopardy clue was indeed a surprise. There is at least one city named Toronto in the U.S. -- in Ohio with a population < 6,000. In addition to Toronto, Ontario, Canada not being a U.S. city, I learned the following using a search engine.

Pearson International is Toronto's largest airport. The airport was named in honor of Lester B. Pearson, the 14th Prime Minister of Canada and recipient of the Nobel Peace Prize. Pearson did military service during WWI, but didn't attain hero status. I'm not sure which airport is 2nd largest, but two candidates follow. John C. Munro Hamilton International Airport is in Hamilton, Ontario, Canada. It is named for John C. Munro, a Hamilton Member of Parliament and cabinet minister. Billy Bishop Toronto City Airport was named after a famous WWI fighter pilot. Hence, none of the names of the three airports fit the Jeopardy clue.

I understand that Watson is more than just a search engine. The NOVA about it that aired last week was fascinating. My point is that the goal of creating Watson was to get a computer to deal with the 'nuances' of language, and that it's ability in that regard hasn't been tested very much. That's what I want to see tonight. If we don't see more funky language then it will be another boring domination by Watson.

Mike

Link to comment
Share on other sites

I have only one response to the last three posters:

Hahahahahaha. Hahaha. Ha ha. Ha.

Since there was a time limit, I've had to abbreviate my response. Plus I have to get ready for tonite's competition by trying to catch a dollar bill dropped between them by clapping my two palms together.

Very dramatic and worth watching.

Phil:

Try this to enhance reaction time, it is much more real!

Clap hands Magnificent Seven

Adam

Link to comment
Share on other sites

Watson won big.

Along with his Final Jeopardy answer, Ken Jennings wrote 'I for one welcome our new computer overlords' or something very similar. :)

If somebody else knows or learns what Ken's exact words were and notifies me in time, then I will correct the above. Done, using Ninth Doctor's version 2 posts below.

I believe Watson failed the Turing test :) (on one interpretation).

Edited by Merlin Jetton
Link to comment
Share on other sites

" I welcome our new computer overlords." Cute and funny.. at the end he tried to choke the computer pedestal ... :-)

I love it!

I'm just figuring out my score...will post it soon...computer bet 17000 on F.J....no way I could catch that...I bet 5000

Link to comment
Share on other sites

I just did the arithmetic.

[i'm rounding the numbers, but none of the ranks was close.]

Watson = 77,200

Phil = between*** 50,000 and 59,300

Ken Jennings = 24,000

Brad (Rutter?)= 22,600

I thought Final Jeopardy might have made the difference and had I bet more I might have beat the computer....or without it beat the computer... but no:

Before F.J, the scores were 59,200 - 45000/54,300 - 23,0000 - 17,600. So, nope, I would have still come in second. All four of us got the final jeopardy question ....it was easy because the places mentioned as setting for 19th century novel were all sounded sort of Romania-Transylvania-Moldovam, so I immediately new it had to be the author of Dracula...took me a few seconds to dredge up his name (I quickly discarded Frankenstein and Mary Shelley because it said the author had written more than one novel and I was pretty sure she hadn't...plus Dracula was -clearly- set there...

Now the bets were: 1000 by Ken, 5000 by both B and myself...but a whopping, ridiculous nearly 18,000 ! by Watson.

Damn I didn't know a computer had that much balls. I really should have bet more because the category was 19th Century novelists which I have pretty good knowledge about...but I had no idea how close I was to Watson...I was pretty sure I couldn't bet enough to lose to K and B. Had I SHOWN SOME BALLS (and I was not able to see Watson's score, I would have bet a lot more to make sure to beat him.)

But I probably would not have beat him, because there is +no freakin' way+ I would have bet more than 18,000 and it turns out the machine was already more than twenty thousand ahead of me so I would have had to bet about forty K.

***A 'between' range is because I still had a problem writing down point values and messed up a few (and crossing off or subtracting if I didn't answer or answered wrong) while trying to keep up with the questions. That was a -very big minus- for my getting as high as possible score compared to the others. But on the other hand, I didn't have to press a buzzer just blurt out answer first, which is a -very big plus- for me compared to the others. So probably it's a wash.

Anyway, I'm okay I guess with second place. For decades, whenever I happen to tune it in, (infrequently) ever since high school I always get about 2/3 of the questions and it seems like that's what happened these three days.

(Although I think my scores have risen since high school...which would make sense. Obviously.)

Link to comment
Share on other sites

(Although I think my scores have risen since high school...which would make sense. Obviously.)

Did you factor in dementia?

--Brant

Phil, I assume your scores would rise since high school because you have engaged in Lifelong Learning. That would make sense for you. But have some pity on the rest of humanity who engage in Lifelong Trying to Hold on to the Few Clues We Have.

Edited by daunce lynam
Link to comment
Share on other sites

> Without commas. What would E.B. White think?

E.B. White wants to take me on in Jeopardy or basketball I'll kick his tiny little journalist butt.

Link to comment
Share on other sites

> Phil, I assume your scores would rise since high school because you have engaged in Lifelong Learning. That would make sense for you. But have some pity on the rest of humanity who engage in Lifelong Trying to Hold on to the Few Clues We Have.

Daunce, that's it. General knowledge is what Jeopardy is testing and it can only grow.

I'd hope you and I and everyone else has more information accumulated and cross-filed/integrated than we did at sixteen.

Edited by Philip Coates
Link to comment
Share on other sites

> Phil, I assume your scores would rise since high school because you have engaged in Lifelong Learning. That would make sense for you. But have some pity on the rest of humanity who engage in Lifelong Trying to Hold on to the Few Clues We Have.

Daunce, that's it. General knowledge is what Jeopardy is testing and it can only grow.

I'd hope you and I and everyone else has more information accumulated and cross-filed/integrated than we did at sixteen.

But that's the thing Phil. Of course we all have more information, much more than we want to have, we can't help acquiring it, or deliberately collecting it. It's the integration and crossfiling that matters - how we do it. The knowledge grows: do we?

Link to comment
Share on other sites

Well, some of it has to. If we learn a thousand things about world history or gardening or what it's like to live in a big city, even if we forget some of it or its slow of access, still the accessible stuff is a net plus, right?

Link to comment
Share on other sites

The researchers also acknowledged that the machine had benefited from the “buzzer factor.”

Both Mr. Jennings and Mr. Rutter are accomplished at anticipating the light that signals it is possible to “buzz in,” and can sometimes get in with virtually zero lag time. The danger is to buzz too early, in which case the contestant is penalized and “locked out” for roughly a quarter of a second.

Watson, on the other hand, does not anticipate the light, but has a weighted scheme that allows it, when it is highly confident, to hit the buzzer in as little as 10 milliseconds, making it very hard for humans to beat. When it was less confident, it took longer to buzz in. In the second round, Watson beat the others to the buzzer in 24 out of 30 Double Jeopardy questions.

“It sort of wants to get beaten when it doesn’t have high confidence,” Dr. Ferrucci said. “It doesn’t want to look stupid.”

Both human players said that Watson’s button pushing skill was not necessarily an unfair advantage. “I beat Watson a couple of times,” Mr. Rutter said.

Source: http://www.nytimes.com/2011/02/17/science/17jeopardy-watson.html?_r=3&pagewanted=all

Link to comment
Share on other sites

The researchers also acknowledged that the machine had benefited from the "buzzer factor."

Both Mr. Jennings and Mr. Rutter are accomplished at anticipating the light that signals it is possible to "buzz in," and can sometimes get in with virtually zero lag time. The danger is to buzz too early, in which case the contestant is penalized and "locked out" for roughly a quarter of a second.

Watson, on the other hand, does not anticipate the light, but has a weighted scheme that allows it, when it is highly confident, to hit the buzzer in as little as 10 milliseconds, making it very hard for humans to beat. When it was less confident, it took longer to buzz in. In the second round, Watson beat the others to the buzzer in 24 out of 30 Double Jeopardy questions.

"It sort of wants to get beaten when it doesn't have high confidence," Dr. Ferrucci said. "It doesn't want to look stupid."

Both human players said that Watson's button pushing skill was not necessarily an unfair advantage. "I beat Watson a couple of times," Mr. Rutter said.

Source: http://www.nytimes.c...&pagewanted=all

That means the computer can react within one one-hundredth of a second when according to this website (http://hypertextbook.com/facts/2006/reactiontime.shtml) average human fingertip reaction time is 20-25 one-hundredths of a second with 5 one-hundredths of a second being the fastest recorded time.

I call shenanigans!

Link to comment
Share on other sites

That means the computer can react within one one-hundredth of a second when according to this website (http://hypertextbook.com/facts/2006/reactiontime.shtml) average human fingertip reaction time is 20-25 one-hundredths of a second with 5 one-hundredths of a second being the fastest recorded time.

I call shenanigans!

I agree, if the Times article is correct. My suspicion in post 24 is now suspect.

Link to comment
Share on other sites

.

IBM Watson Project

~~~~~~~~~~~~~~~~

Boydstun Note on Turing Test (2000)

The basic idea of the Turing test is that any machine capable of conversing in natural language would be exhibiting the general competence we call intelligence. More particularly, any machine (any digital computer) that could engage in verbal conversation with a normal grown human being, so well as does one such human with another, would surely be a machine possessing intelligence.

The ability to engage in natural-language conversation with normal grown humans is an apt and very demanding test for machine general intelligence. One can converse about most anything, notably about one's own intellectual abilities. Unless one has in ample degree the abilities one claims, one will quickly trip during a conversation with someone truly having those abilities. Conversation will manifestly falter unless there is mutual understanding by-and-large of what is being expressed in the language of the conversation, of what is meant by the linguistic expressions, one's own and those of one's partner. To understand what some linguistic expression of one's partner means is to make sense, the right sense, of that expression (Haugeland 1998, chapter 2, and 1997, chapter 1).

An adequate sense-making, linguistic machine must implement background common sense in order to disambiguate ordinary language. Take Bar-Hillel's example: "The box was in the pen." Common-sense knowledge that boxes are typically larger than ink pens is enough to eliminate that interpretation of pen. Making the right sense of "the box was in the pen" also requires understanding of the topic of conversation and how the declaration fits the topic. Pen might mean animal pen or child playpen, depending on fit with topic. Then too, making the right sense requires understanding of metaphor and fancy. "The box was in the pen" might mean that a drawing of a box flowed out of an ink pen. Moreover, to understand make-believe in conversation, a participant must comprehend its contrast with real-believe. And to understand "the box was in the pen" as a real-believe assertion, one must have the notions truth and falsity, being so in reality and not being so in reality. Holding up one's end of an elementary verbal conversation requires profound intelligence.

There are nonlinguistic sorts of performances that indicate prima facie intelligence within well-partitioned arenas. Such is the world-champion chess-playing machine Deep Blue. The performance of Deep Blue is not by exhaustive reckoning of all possible eventualities stemming from the legal next moves. Such a reckoning would never finish. Intelligence requires avoidance of combinatorial explosions.

There is something basic about chess that normal human chess players understand about it and that Deep Blue does not: chess is play, a game. One does not understand what playing a game is unless one understands its resonance with, and its contradistinction to, real life. Still, lacking that understanding should not warrant defeasance of the prima facie removed-from-life specialty-intelligence of the chess machine.

A machine passing the Turing test would exhibit a general intelligence, one subsuming the various specialty-intelligences it has, one enabling it to talk intelligently about those specialties and their wider contexts (Haugeland 1997).

~~~~~~~~~~~~~~~~

See also Intelligence, Animality, and Machinery (2000).

Edited by Stephen Boydstun
Link to comment
Share on other sites

I don't watch Jeopardy often enough to know if the question categories get repeated a lot. If so, and Watson is programmed to 'parse' those, that might have been one of its biggest advantages: The game moves so fast that there's little time for a human to relate the question to the category (that can be very helpful). But I read somewhere that the computer's memory banks contain -every- previous Jeopardy shows in its entirety.

So if Watson knows the kind of questions that have been asked in that category (often a pun or a play on words), it can sometimes narrow it search to that type of question.

This makes the computer more able to approximate the highest human skill: -integration-, rather than just brute force, numerical frequency 'association'.

,,,,,,,,,,

On a related note, I'm excited by the abilities of Watson and their applicability to improving "search" technology. Today I did a Google search on "tnt" and the software was smart enough to know that it is more likely someone is searching for the cable tv channel, not the explosive. But many other times, I find googling something to be frustrating because of the boatload of crap it returns.

Another area that can be -vastly- improved is language translation. ==>

(1)Literature and articles: DGMS, but I've suffered through really bad translations of French novels (Victor Hugo), Greek (the Odyssey and the Iliad), and I'm wary of reading some Russian novels in English (I have Brothers Karamazov starting at me in all its baleful thickness on my shelf right now....)

(2) Travel and conversation: According to reviews on Amazon, CNET, etc., much of the little portable gadgets you'd take with you on a foreign trip are junk - either they /a/ only translate one word at a time rather than complete thoughts or /b/ they only focus on 'tourist phrases' or /c/ incompleteness - they leave out a complete dictionary-full of all possible words you might need, (d) inaccuracy - a word that has multiple senses [end: bound or limit, goal, to finish, etc.] and you get the wrong one or only some meanings.

[Aside: The best-selling (dedicated) portable translators dominating that market (as opposed to apps on an iphone or laptop or web-based google translate) apparently are the Franklins and they apparently are...hmmm...let me look in my electronic dictionary for a four letter word that describes it --> Crap.]

Edited by Philip Coates
Link to comment
Share on other sites

One of the questions on the first episode was, "Bang, bang, his silver hammer left them dead." Watson responded, "What is Maxwell's Silver Hammer?" and was awarded the points.

In effect, it was a case of the proctors throwing the Turing Test in Watson's favor. The computer's answer was strictly incorrect.

The problem with the Turing Test is that it tells you nothing about whether the computer itself is actually conscious, only how smart the test administrators are.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now