The Stuff of Thought

June 5th, 2009

I have been reading The Stuff of Thought by Steven Pinker which is a survey of different ways language gives clues to how the mind works.  (See my related entry dated September 28, 2007)  In the fourth chapter, he gives a very enlightening portrayal of how language describes objects in space. 

“Languages tend to have terms for contact, vertical alignment, attachment, containment, and proximity, as if there were a cognitive alphabet of spatial relationships more basic than the prepositions of a given language.”  (p. 178)

“A light bulb is considered to be in a socket when its base is inserted, since that allows it to be illuminated, but a person is not in a car if only his arm extends in through a window, since that doesn’t allow the car to move him or even shelter him.” (p. 187)  The meaning of the preposition in depends on the objects that are being described.

“If Sally has one big stone and Jenny three much smaller stones, who has more?  The question by itself is unanswerable: it depends on whether you mean “more stone”, or “more stones.”  (p. 173)   The meaning of more depends on whether it is referring to the number of objects (stones) or the mass/volume/weight of the object (stone).

“The part of the mind that interfaces with language treats objects schematically.  … Every morsel of matter has a length, a width, and a thickness, but when we speak of these morsels we pretend that some of the dimensions aren’t there. … A road, a river, or a ribbon is conceptualized as an unbounded line (its length which serves as its single primary dimension) fattened out by a bounded line (its width which serves as a secondary dimension), resulting in a surface.”  (pp. 179, 180)

 ”Since words and syllables aren’t free, languages economize when they can. … Imagine you are in a rainstorm, ten feet away from an overhanging ledge.  Move one foot toward it, you still get wet.  Move over another foot; you still get wet.  Keep moving, and at some point you no longer get wet.  Continue to move another foot in the same direction, you don’t get any dryer.  So nature has set up a discontinuity between the segment of the path where gradual changes of position leave you equally wet and the segment where gradual changes leave you equally dry.  And it is exactly at that discontinuity that one would begin to describe your position using under rather than near.”  (P 186)

“Spatial terms quantize space at the cusps where causal events play out differently on each side.  As your palm gradually [wraps] around a marble, the curvature at which you stop saying the marble is on the hand and start saying it’s in is more or less the shape that would prevent it from rolling off when you jiggle it.”  (P 186)

Dr. Pinker’s premise in this book is that language reflects our thoughts.  By disecting our language, we get a glimpse of how the thought engine behind the language works.  We use count nouns and mass nouns in language because our minds see countable items such as chairs or dogs and our minds also see non-countable mass objects such as water or furniture.  We use a preposition like along to describe proximity to a one-dimensional line and we use inside to describe containment in a two or three dimensional object. 

All languages take slightly different approaches to describing space, but there are similarities that can possibly be used to infer an underlying brain structure that helps define our language.  “Most of the world’s languages divide the space around the speaker into just two regions, though about a quarter of them (including Spanish) make a three-way distinction among ‘near me’, ‘far from me’, and ‘in between,’ and a very few go to four, adding ‘very far from me’.” (p. 178)  He is referring to the English terms here (near me) and there (far from me).

“Not all languages carve [spatial relationships] up in the same way.  Presumably this is because each language trades off expressiveness, precision, word length, and vocabulary size in a different way.  But the quantization of spatial relations is universal, and causally important relations like contact, attachment, alignment, verticality and proximity make their appearance in all the spatial vocabularies of the world.”  (p. 187)

The book is an excellent example of Dr. Pinker’s writing - it is entertaining while at the same time being specific and to the point.  He digs into the issues and comes at them from all aspects - cognitive psychology, neuro-science, pathology, and child language acquisition.  He is an academic, and at the same time he presents his material in a way that is concise and engaging.

Open Domain Question Answering

May 19th, 2008

As part of the CLMA program here at University of Washington, we were asked to investigate an area of interest and write a short summary of what we read about.

So I chose to look at the TREC Question Answering challenge and read about some of the techniques that were used for submissions to that evaluation.  The kinds of questions that are used for this include simple factoid questions such as “How many calories are in a Big Mac?”, list questions such as “Which past and present NFL players have the last name of Johnson?”, and definition question such as “What is a Golden Parachute?”.

Many of the system descriptions submitted to TREC have pipelined architectures.  Here are some examples of components that were described:

  • Question Type Classification - Decide what the answer should be like.  Is it asking for a name, a date, a description or a list, etc.
  • Question Rewrite - Convert the question to search terms that are used for the IR search.  This step typically expands the list of words used in the search by adding synonyms.  This gives a broader list of pages returned that can then be further analyzed.
  • IR Search - generally a simple search using established technologies.
  • Passage Selection - deciding which portions of the text returned by the IR search to include in the results.
  • Answer extraction and ranking - creating the actual text that will be returned as answers.

The paper is not intended to be a thorough description of the field.  It helped me get a handle on what current designs are being developed.

Here is the link to my exploration paper: QuestionAnsweringAtTrec

Entailment Revisited

February 3rd, 2008

In a previous entry, I wrote about the Entailment Challenge. Entailment is a linguistic knowledge concept that concerns two sentences about an event or idea. It is said that sentence A ‘entails’ sentence B when all of the meaning in B is contained in A. Follow this link to see more discussion: Recognizing Entailment.

As part of the Master’s program in Computational Linguistics at the University of Washington, we are preparing for internship appointments in the summer of 2008. As part of that preparation, we were asked to pick a topic that concerns Computational Linguistics and write a short summary of a few papers that covered the topic. This is a PDF of that paper. PreInternship Topic

Comparing Bernoulli and Multinomial

February 3rd, 2008

As part of our NLP statistical processing class in the CLMA program at the University of Washington, we did a comparison of Naive-Bayes learning algorithms. We compared the Bernoulli and Multinomial approaches to this problem and the results are shown in the table below.

This test was run by ‘training’ a classifier on a set of data instances. Each instance has a vector of features. In the Bernoulli case, we treat the features as binary. In the multinomial case, we treat the features as a numeric value from 0 to n where n is the number of instances the given word was found in the instance. Both sets of data have a ‘class’ assigned to them. After we train on the data, we check our system results with the actual class value assigned to each instance. This is where the percent accuracy comes from.

Of course these numbers are from a single test on one set of training and test data, but the fact that the Multinomial results are 91% accurate compared to 88% accurate for the Bernoulli is pretty telling. Also you can see that the elapsed runtime for the test is significantly different as well. The ‘cross_prob_delta’ is a parameter for tweaking the ‘add-one’ smoothing in the training stage.

Bernoull

Cross
prob
delta

Training Accuracy

Test Accuracy

Wall Clock (seconds)

0.1

0.9303

0.8800

303.23

0.5

0.9103

0.8633

318.10

1.0

0.8970

0.8400

305.45

2.0

0.8796

0.8233

305.56

Multinomial

Cross prob
delta

Training Accuracy

Test Accuracy

Wall Clock (seconds)

0.1

0.9570

0.9133

8.33

0.5

0.9503

0.9066

8.50

1.0

0.9448

0.9000

8.29

2.0

0.9400

0.8966

8.26

While we were working on this evaluation, we were brainstorming on other ways to compare the Bernoulli and Multinomial approaches. This is what we came up with.

  • Office chair comparison. Put signs on two contestants that are sitting in office chairs. The signs are ‘Bernoulli’ and ‘Multinomial’. The contestants race down the hallway without lifting themselves out of the chair. The first sign to the finish line is the method of choice.
  • Date getting comparison. Again, put signs on two contestants. Have them stand back to back in the center of the student lounge and randomly ask girls for dates. The sign that gets the most dates is the method of choice.

All kidding aside, the NLP stats processing class is challenging and fun. We are gaining insight into how these methods can be used for basic classification of data.

Diphones in Text To Speech

January 5th, 2008

For a phonetics class that is part of the Master’s program at the University of Washington, I wrote a research paper on how Diphones are used in text to speech systems. Essentially, Diphones are portions of words that are extracted from a recording of words or sentences.

One of the main problems with Text To Speech systems is making them sound natural by varying the prosody of the output. Prosody is the term for the variation in pitch, duration and intensity that all people use when speaking an utterance. By splitting a recording into Diphones, the system can select from a list of candidates for each slot in the output. The system finds the Diphone candidate that is closest to the desired prosody.

Here is an image that showing the word ‘maybe’. There are four phones or segments ‘m’ ‘ay’ ‘b’ ‘e’. A Diphone is two halves of two adjacent phones. The middle of the phone is the most stable portion. By splitting the recording at the middle of each Diphone there is less disturbance at the joints between Diphones that are concatenated in the simulated speech output.
Maybe
Here is a link to the paper that describes the technique of using Diphones for text to speech systems.

Text To Speech Using Diphones.pdf

Corpus size for ngram training

November 8th, 2007

As part of my graduate courses at the University of Washinton, we are studying ngram based language models.  This means learn the possible groups of words.  If N is three, then all the possible groups of 3 words as found in a corpus.

Yesterday in class we were discussing the size of corpus required to train a ngram language model.  Our professor  said that for a tri-gram model, perhaps a billion words would be sufficient, but for larger ngram sizes as much as a trillion words would be required.

Philosophically, I think this points out the limitations of n-gram training.  A human has a total corpus that is much smaller than this.

15 yr * 365 d/yr * 16 hr/d * 3600 sec/hour * 1 word/sec = ~315 million

This number gets smaller if you realize that humans don’t have constant input for 16 hours a day.  This number gets larger if you think that adults need more than 15 years to be fully functional in today’s world of specialized knowledge.

But never-the-less, this is less than a billion words and certainly less than a trillion words.  Having the ability to process input in a way that detects linguistic structure gives us humans an advantage over systems that can’t pick out structure.

Dolphin Speak

October 10th, 2007

Here is another Gary Larson perspective on trying to understand another species.

Of course, a human language is composed of constituents (phrases) that can be combined in many different ways.  If these dolphins were capable of human type communication, then these ’scientists’ would at least be looking for pieces of sentences and recombinations of those sentences in novel ways instead of just repitition of the top level sentences.

But still, it is funny.

DolphinSpeak

My appologies to the copyright holder of this image.

Bender’s Axiom

October 6th, 2007

I am taking Ling566, Introduction to Syntax for Computational Linguistics, at the University of Washington as part of a Master’s program in Computational Linguistics.  The course is taught by Dr. Emily Bender who is also the director of the program.

This week Emily was introducing how feature structures are used to create a grammar description for English.  A big part of the grammar is the syntax portion, how words are formed into phrases and phrases are joined into sentences.  Feature structures are a way of adding detail to a grammar so that things like agreement can be accounted for.

As part of her lecture she said, “There is no magic in syntax.”

What she means by this is that when specifying the grammar using feature structures, all of the details have to be specified.  If something is left out of the definition, then the grammar will not work correctly.

A statement that is similar to this that I am fond of repeating is “It does exactly what you tell it to.”  What is meant by this is that the computer is a machine that executes the instructions given to it - it executes them faithfully.  When a program runs correctly and performs the desired actions without any negative side effects, this is because the program was written that way.  And just the same, when a program crashes and you lose your data, this is because the instuctions in the program have been arranged in a way that makes it crash.

At any rate, I am thoroughly enjoying taking classes at UW.  It is a thrill to be spending all of my time focused on CL.

 

 

Steven Pinker in Person

September 28th, 2007

We went to see Steven Pinker at the Seattle Town Hall.  He is promoting his new book - The Stuff of Thought.  I have read several of his books, so I was pleased to have a chance to hear him speak.

One of the focus points of his talk was how English uses prepositions to designate space and time.  For example, he asked why do we say something is under water when the object is truly surrounded by water, and why do we say after dark when we really mean a time period surrounded by darkness.  His proposition is that the mind simplifies its perspective when possible (Occam’s Razor?).  The surface of water become a 2-D boundary which then an object can be above or under.  Similarly, the boundary of nighttime (darkness) becomes a point in time after which we say ‘after dark’.  As further illustration of the dimensional reduction, he pointed out that we don’t say, “an ant walks along a plate”, because the preposition along requires a one dimensional object, and that we do say, “an ant walks along the edge of a plate” because in one sense, the edge can be tought of as a one dimensional object.

The most entertaining portion of his talk was about how swearing is used.  I suspect the reason it was so funny was the contrast between his clinical descriptions (formal register) of swearing and the familiar register that is used when someone is swearing.  He gave the example of someone accepting an award for popular music saying “this is really f***ing brilliant” and saying how in this case “f***ing” is used as an adverb.

Another example he gave that was astonishing was the case of the world trade center disaster.  Apparently, the insurance contract has a phrase of “3.5 billion dollars per event”.  The court case was held up on interpretting whether the 9/11 incident was one event, as in one master plan of destruction was executed, or if it was two events, as in two airplanes were used to destroy two buildings.  The effect of this distinction was whether the insurance should pay $3.5 billion or $7 billion.  Quite a substantial difference that is based on the judgement of a linguist.

Overall, Dr. Pinker’s presentation was very entertaining and enlightening.  If you have a chance to hear him speak, I recommend that you go.

Gary Larson As Linguist

September 16th, 2007

This year I have been enjoying a Gary Larson daily calendar.  It goes without saying that Gary has a unique insight into reality in our lives.  Many of his cartoons use issues that are illuminated by a linguistic view point.

For example, in this frame, the dog has written a threat letter to the cats, but the dog only uses one word.

Our dog certainly has a wider vocabulary than one kind of bark, but for each situation such as a barking at a stranger, he only uses one ‘word’.  However, he does vary his barks.  Some barks are louder and there is variation in pitch.  His series of barks could be interpretted as having prosody (variation in pitch and emphasis).  Of course, we as humans can’t tell if there is any information that can be interpretted from the variation, or if it just means that he is not capable of generating a series of barks that are identical.

DogThreatLetters.JPG

Here is another frame relating to dogs.  Dogs certainly understand many human words - their name, ‘out’, ‘walk’, ’sit’, ‘go lay down’, etc.  But dogs don’t make a relationship between words when uttered in a series.  My interpretation is that they hear one or two words in a context and use that as the entire meaning of the situation.  Our dog is very tuned into ‘walk’.

WhatDogsHear

This frame is about meeting aliens and trying to communicate through translation of language.  The assumption is that if we do ever meet an alien, that the same technique for language translation we use between human languages will also work with aliens.  This will certainly be the place to start, but what if the alien brain language structures are different than ours?  In other words, Chomksy has helped us see that all human languages are based on similar structures, but if we do meet aliens, we won’t necessarily be able to rely on the existence of that similarity.

TakeMeToYourStove

This frame shows how a misinterpretation of a foreign word can be used as a joke.  Of course, Webster’s gives us the definition for Kemosabe as “faithful friend”.

Kemosabe

This frame shows a common play on words.  Take a phrase or frequent saying and replace one or more of the words.  Also in this case, he is using a homophone (same sound different spelling) for ‘ate’ versus ‘8′.

I_8_NY.JPG

This frame makes fun of our basic drives for attracting mates.  The truth is that many of our instincts come from our more simple ancestors.  The only real difference between us and lower animals is that we are self-aware and are able to modify our behavior in much more complex ways.

AnimalsAndTheirMatingSongs.JPG

My appologies to the copyright holder of these images.