Research Project Name
A Cognitively Plausible Model of Child Language Learning: Categories,
Agreement, and Morphology
Objective
The objective of this research project is to create a computational model of
early child language that obeys cognitive constraints on first language
learning. The input to the computational model is from the Roger Brown Corpus of recorded
baby-mother
interactions. The learning achieved by the computational model is compared with
the learning achieved by babies as documented in the Brown Corpus.
Results
CAM
(Categories, Agreement, and Morphology) is a computational model of
several important aspects of language acquisition. CAM
is based on the Semantic Bootstrapping Hypothesis of Stephen Pinker. CAM
respects widely accepted psychological constraints such as no
negative evidence and no memory of previous inputs. CAM
learns in a largely bottom-up manner, learning parts of categories fist, then
context-free grammar rules based on these categories, and finally agreement
rules on top of the context-free grammar rules.
CAM
duplicates the partial order relations observed by Brown in children for the
progressive, the plural, the third-person regular, and the auxiliary verbs.
CAM
solves the negative evidence problem for agreement rule learning: though it
receives no negative evidence in the input, it is nonetheless capable of
providing both positive and (internally generated) negative examples to its
built-in Boolean learning algorithm, which creates the agreement rules.
CAM
learns parts of both English and Cheyenne,
a highly morphological American Indian language. Procedures for
syntactic category inference are described. An approach to integration of semantic bootstrapping and syntax-driven syntactic
category inference into one
system are also described.
CAM shows how the form of X-bar Theory is
influenced by acquisition, parsability, and syntactic category inference.
Finally, the full range of grammars learnable by CAM is
described in precise, mathematical detail. Three results are shown: (1)
CAM’s correctness, i.e., its ability to correctly identify a
target grammar from inputs based on that grammar, (2) its order invariance over
different input orders, and (3) its
robustness: the ability to correctly learn a target language from
a vastly more complex input language.
A
major open question in cognitive science is the extent to which language
learning is the result of hard-wired syntactic rules, meta-rules, or parameters.
The results of this work provide support for the Semantic bootstrapping
hypothesis (Pinker 1984), which holds that (at least some of) the acquisition of
syntax (and in this case, morphology) is semantics-driven.
Publications
- Nicholl, S. and Wilkins, D. C., "Efficient Learning of
Language Categories," Proceedings of the Twelfth Cognitive
Science Conference, Cambridge, Mass., July 25–28, 1990, 455–562.
pdf
- Nicholl, S. and Wilkins, D. C., "Computer Modeling of
Acquisition Orders in Child Language," Proceedings of the Eighth
International Machine Learning Workshop: Computational Models of
Human Learning Track, Northwestern University, June 27–29, 1991,
100–104.
pdf
- Sheldon Scott Nicholl (1992), "Language Acquisition by Computer: Learning categories, Agreement, and Morphology Under Psychological Constraints,"
Ph.D. Dissertation, Department of Computer Science, University of Illinois, Urbana-Champaign, June 1992, 213 pages.
pdf