Parsing-Lexicalization Text Mining

Lexicalization of
PCFGs
Introduction
Christopher
Manning
Christopher Manning
(Head) Lexicalization of PCFGs

[Magerman 1995, Collins 1997; Charniak 1997]
• The head word of a phrase gives a good representation of the

phrase’s structure and meaning
• Puts the properties of words back into a PCFG
Christopher Manning

• The head word of a phrase gives a good representation of the

phrase’s structure and meaning
• Puts the properties of words back into a PCFG
Christopher Manning

• Word-to-word affinities are useful for certain ambiguities

• PP attachment is now (partly) captured in a local PCFG rule.
• Think about: What useful information isn’t captured?
VP
VP
NP
PP NP PP
announce RATES FOR January ANNOUNCE rates IN January
• Also useful for: coordination scope, verb complement patterns

Christopher Manning
Lexicalized parsing was seen as the

parsing breakthrough of the late 1990s
• Eugene Charniak, 2000 JHU workshop: “To do better, it is

necessary to condition probabilities on the actual words of the
sentence. This makes the probabilities much tighter:
• p(VP  V NP NP) = 0.00151
• p(VP  V NP NP | said) = 0.00001
• p(VP  V NP NP | gave) = 0.01980 ”
• Michael Collins, 2003 COLT tutorial: “Lexicalized Probabilistic

Context-Free Grammars … perform vastly better than PCFGs
(88% vs. 73% accuracy)”
Lexicalization of
PCFGs
Introduction
Christopher
Manning
Lexicalization of
PCFGs
The model of
Charniak (1997)
Christopher Manning
Charniak (1997)
• A very straightforward model of a lexicalized PCFG

• Probabilistic conditioning is “top-down” like a regular PCFG
• But actual parsing is bottom-up, somewhat like the CKY algorithm we saw
Christopher Manning
Charniak (1997) example

Christopher Manning
Lexicalization models argument selection

by sharpening rule expansion probabilities
• The probability of different verbal complement frames (i.e.,

“subcategorizations”) depends on the verb:
Local Tree come take think want
VP  V 9.5% 2.6% 4.6% 5.7%

VP  V NP 1.1% 32.1% 0.2% 13.9%
VP  V PP 34.5% 3.1% 7.1% 0.3%
VP  V SBAR 6.6% 0.3% 73.0% 0.2%
VP  V S 2.2% 1.3% 4.8% 70.8%
VP  V NP S 0.1% 5.7% 0.0% 0.3%
VP  V PRT NP 0.3% 5.8% 0.0% 0.0%
VP  V PRT PP 6.1% 1.5% 0.2% 0.0%
“monolexical” probabilities
Christopher Manning
Lexicalization sharpens probabilities:

Predicting heads
“Bilexical probabilities”
• P(prices | n-plural) = .013

• P(prices | n-plural, NP) = .013
• P(prices | n-plural, NP, S) = .025
• P(prices | n-plural, NP, S, v-past) = .052
• P(prices | n-plural, NP, S, v-past, fell) = .146
Christopher Manning
Charniak (1997) linear

interpolation/shrinkage
Christopher Manning
Charniak (1997) shrinkage example

Lexicalization of
PCFGs
The model of
Charniak (1997)
PCFG
Independence
Assumptions
Christopher Manning
PCFGs and Independence
• The symbols in a PCFG define independence assumptions:

S
S  NP VP NP
NP VP
NP  DT NN
• At any node, the material inside that node is independent of the

material outside that node, given the label of that node
• Any information that statistically connects behavior inside and
outside a node must flow through that node’s label
Christopher Manning
Non-Independence I
• The independence assumptions of a PCFG are often too strong
All NPs NPs under S NPs under VP

23%
21%
11%
9% 9% 9%
6% 7%
4%
NP PP DT NN PRP NP PP DT NN PRP NP PP DT NN PRP
• Example: the expansion of an NP is highly dependent on the

parent of the NP (i.e., subjects vs. objects)
Christopher Manning
Non-Independence II
• Symptoms of overly strong assumptions:
• Rewrites get used where they don’t belong
In the PTB, this

construction is
for possessives
Christopher Manning
Refining the Grammar Symbols
• We can relax independence assumptions by encoding

dependencies into the PCFG symbols, by state splitting:
Parent annotation Marking

[Johnson 98] possessive NPs
• Too much state-splitting  sparseness (no smoothing used!)

• What are the most useful features to encode?
PCFG
Independence
Assumptions
The Return of
Unlexicalized
PCFGs
Christopher Manning
Accurate Unlexicalized Parsing

[Klein and Manning 1993]
• What do we mean by an “unlexicalized” PCFG?

• Grammar rules are not systematically specified down to the level of
lexical items
• NP-stocks is not allowed
• NP^S-CC is fine
• Closed vs. open class words
• Long tradition in linguistics of using function words as features or markers for
selection (VB-have, SBAR-if/whether)
• Different to the bilexical idea of semantic heads
• Open-class selection is really a proxy for semantics
• Thesis
• Most of what you need for accurate parsing, and much of what lexicalized
PCFGs actually capture isn’t lexical selection between content words but
just basic grammatical features, like verb form, finiteness, presence of a
verbal auxiliary, etc.
Christopher Manning
Experimental Approach
• Corpus: Penn Treebank, WSJ; iterate on small dev set
Training: sections 02-21

Development: section 22 (first 20 files) 
Test: section 23
• Size – number of symbols in grammar.
• Passive / complete symbols: NP, NP^S
• Active / incomplete symbols: @NP_NP_CC [from binarization]
• We state-split as sparingly as possible

• Highest accuracy with fewest symbols
• Error-driven, manual hill-climb, one annotation at a time
Christopher Manning
Horizontal Markovization
• Horizontal Markovization: Merges States
74% 12000
73% Symbols 9000
72% 6000
71% 3000
70% 0
0 1 2v 2 inf 0 1 2v 2 inf
Horizontal Markov Order Horizontal Markov Order
Christopher Manning
Vertical Markovization
• Vertical Markov order: Order 1 Order 2

rewrites depend on past
k ancestor nodes.
(i.e., parent annotation)
79% 25000
78% 20000
77%
Symbols
76% 15000
75% 10000
74%
73% 5000
72% 0
1 2v 2 3v 3 1 2v 2 3v 3
Vertical Markov Order Vertical Markov Order Model F1 Size
v=h=2v 77.8 7.5K
Christopher Manning
Unary Splits
• Problem: unary
rewrites are used to
transmute
categories so a high-
probability rule can
be used.
 Solution: Mark
unary rewrite sites
Annotation F1 Size
with -U
Base 77.8 7.5K
UNARY 78.3 8.0K
Christopher Manning
Tag Splits
• Problem: Treebank tags are

too coarse.
• Example: SBAR sentential

complementizers (that,
whether, if), subordinating
conjunctions (while, after),
and true prepositions (in, of,
to) are all tagged IN.
Annotation F1 Size
• Partial Solution:
Previous 78.3 8.0K
• Subdivide the IN tag.
SPLIT-IN 80.3 8.1K
Christopher Manning
Yield Splits
• Problem: sometimes the behavior

of a category depends on
something inside its future yield.
• Examples:
• Possessive NPs
• Finite vs. infinite VPs
• Lexical heads!
• Solution: annotate future Annotation F1 Size

elements into nodes.
tag splits 82.3 9.7K
POSS-NP 83.1 9.8K
SPLIT-VP 85.7 10.5K
Christopher Manning
Distance / Recursion Splits
• Problem: vanilla PCFGs cannot NP -v

distinguish attachment
heights. VP
NP
• Solution: mark a property of
higher or lower sites: PP
• Contains a verb.
v
• Is (non)-recursive. Annotation F1 Size
• Base NPs [cf. Collins 99]
Previous 85.7 10.5K
• Right-recursive NPs
BASE-NP 86.0 11.7K
DOMINATES-V 86.9 14.1K
RIGHT-REC-NP 87.0 15.2K
Christopher Manning
A Fully Annotated Tree

Christopher Manning
Final Test Set Results
Parser LP LR F1
Magerman 95 84.9 84.6 84.7
Collins 96 86.3 85.8 86.0
Klein & Manning 03 86.9 85.7 86.3
Charniak 97 87.4 87.5 87.4
Collins 99 88.7 88.6 88.6
• Beats “first generation” lexicalized parsers

The Return of
Unlexicalized
PCFGs
Latent Variable
PCFGs
Extending the idea

to induced
syntactico-
semantic classes
Christopher Manning
Learning Latent Annotations

[Petrov and Klein 2006, 2007]
Can you automatically find good symbols? Outside

 Brackets are known
 Base categories are known
 Induce subcategories X1
 Clever split/merge category refinement
X2 X4 X7
X3 X5 X6
.
EM algorithm, like Forward-Backward for He was right

HMMs, but constrained by tree
Inside
Christopher Manning
POS tag splits’ commonest words:

effectively a semantic class-based model
 Proper Nouns (NNP):

NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
 Personal pronouns (PRP):

PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
0
5
10
15
20
25
30
35
40
NP
VP
Christopher Manning
PP
ADVP
ADJP
SBAR
QP
WHNP
PRN
NX
SINV
PRT
WHPP
SQ
CONJP
FRAG
NAC
UCP
WHADVP
INTJ
SBARQ
Number of phrasal subcategories
RRC
WHADJP
ROOT
LST
Christopher Manning
The Latest Parsing Results… (English PTB3 WSJ train 2-21, test 23)
F1 F1
Parser ≤ 40 words all words
Klein & Manning unlexicalized

86.3 85.7
2003
Matsuzaki et al. simple EM latent
86.7 86.1
states 2005
Charniak generative, lexicalized
90.1 89.5
(“maxent inspired”) 2000
Petrov and Klein NAACL 2007 90.6 90.1
Charniak & Johnson discriminative
92.0 91.4
reranker 2005
Fossum & Knight 2009
92.4
combining constituent parsers
Latent Variable
PCFGs
Extending the idea

to induced
syntactico-
semantic classes

Parsing-Lexicalization Text Mining

Uploaded by

Copyright:

Available Formats

Parsing-Lexicalization Text Mining

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parsing-Lexicalization Text Mining

Uploaded by

Copyright:

Available Formats

Lexicalization of

(Head) Lexicalization of PCFGs

• The head word of a phrase gives a good representation of the

(Head) Lexicalization of PCFGs

• The head word of a phrase gives a good representation of the

(Head) Lexicalization of PCFGs

• Word-to-word affinities are useful for certain ambiguities

announce RATES FOR January ANNOUNCE rates IN January

• Also useful for: coordination scope, verb complement patterns

Lexicalized parsing was seen as the

• Eugene Charniak, 2000 JHU workshop: “To do better, it is

• Michael Collins, 2003 COLT tutorial: “Lexicalized Probabilistic

• A very straightforward model of a lexicalized PCFG

Charniak (1997) example

Lexicalization models argument selection

• The probability of different verbal complement frames (i.e.,

VP  V 9.5% 2.6% 4.6% 5.7%

Lexicalization sharpens probabilities:

• P(prices | n-plural) = .013

Charniak (1997) linear

Charniak (1997) shrinkage example

PCFGs and Independence

• The symbols in a PCFG define independence assumptions:

• At any node, the material inside that node is independent of the

• The independence assumptions of a PCFG are often too strong

All NPs NPs under S NPs under VP

NP PP DT NN PRP NP PP DT NN PRP NP PP DT NN PRP

• Example: the expansion of an NP is highly dependent on the

In the PTB, this

Refining the Grammar Symbols

• We can relax independence assumptions by encoding

Parent annotation Marking

• Too much state-splitting  sparseness (no smoothing used!)

Accurate Unlexicalized Parsing

• What do we mean by an “unlexicalized” PCFG?

• Corpus: Penn Treebank, WSJ; iterate on small dev set

Training: sections 02-21

• We state-split as sparingly as possible

• Horizontal Markovization: Merges States

73% Symbols 9000

• Vertical Markov order: Order 1 Order 2

• Problem: Treebank tags are

• Example: SBAR sentential

• Problem: sometimes the behavior

• Solution: annotate future Annotation F1 Size

Distance / Recursion Splits

• Problem: vanilla PCFGs cannot NP -v

A Fully Annotated Tree

Final Test Set Results

• Beats “first generation” lexicalized parsers

Extending the idea

Learning Latent Annotations

Can you automatically find good symbols? Outside

EM algorithm, like Forward-Backward for He was right

POS tag splits’ commonest words:

 Proper Nouns (NNP):

 Personal pronouns (PRP):

Klein & Manning unlexicalized

Extending the idea