The Kam4D Linguistic Knowledge Graph: Putting Smurfs, Ducks, Lemurs, and Party Terms to the Service of African Languages
The Kam4D Linguistic Knowledge Graph: Putting Smurfs, Ducks, Lemurs, and Party Terms to the Service of African Languages
The Kam4D Linguistic Knowledge Graph: Putting Smurfs, Ducks, Lemurs, and Party Terms to the Service of African Languages
• As a knowledge resource
• As a data resource
• As a basis for any-to-any translation
3
In service since 1994 - originally at Yale Council on African Studies
International NGO since 2009
• Registered non-profit in 🇺🇸 and 🇨🇭
Academic Home since 2013:
EPFL - Swiss Federal Institute of Technology in Lausanne
First at LSIR - Distributed Systems Information Laboratory
Now at the Swiss EdTech Collider 4
ACALAN (Intergovernmental language agency for 55 member states of the African Union):
Platform for African Language Empowerment development partner
5
6
This is a word
light
12
Lemur =
• Lemma
• Lemmatic form
• Dictionary form
• Canonical form
• Citation form
light
13
light
lumineux
light
léger léger
allég
é
light
Party term =
• Multiword
Expression
• MWE
16
fi: kevyt fi:
tyhjänpäiväinen
th: เบา th: ซึง# ไร้ สาระ
en: light
fr: léger fr: léger
en: light
fi: kevyt fi:
tyhjänpäiväinen
th: เบา th: ซึง# ไร้ สาระ
light
24
DUCKS = Data
Unified Concept
Knowledge Set
light (not dark) fr: lumineux sw: -enye mwanga fi: valoisa th: สว่าง
DUCKS = Data
light (not serious) Unified Concept
Knowledge Set
light (not heavy) fr: léger sw: -epesi fi: kevyt th: เบา
light (not serious) fr: léger sw: -a kuchekesha fi: th: ซึง# ไร้ สาระ
tyhjänpäiväinen
DUCKS = Data
Unified Concept
Knowledge Set
light (not fa@ening) fr: allégé sw: pungufu fi: kaloriton th: ที#แคลอรี# ตํ#า
DUCKS = Data
Unified Concept
Knowledge Set
how Kamusi makes a mul+lingual dic+onary possible
30
light (not dark) fr: lumineux sw: -enye mwanga fi: valoisa th: สว่าง
light (not heavy) fr: léger sw: -epesi fi: kevyt th: เบา
light (not serious) fr: léger sw: -a kuchekesha fi: hölynpöly th: ซึง# ไร้ สาระ
light (not fa@ening) fr: allégé sw: pungufu fi: kaloriton th: ที#แคลอรี# ตํ#า
light (not heavy) fr: léger sw: -epesi fi: kevyt th: เบา
• 4D = Four Dimensional
• Time is the fourth dimension - capacity to treat language
change and historical languages
• Graph database structure for a complete matrix of
human expression across time and space
• the structure is realistic; the final goal is an impossible
aspiration
• Molecular lexicography design
34
light
35
light
36
light
37
meaning
shap place
e
sound +me
rela+onships
38
meaning
shap place
e
sound +me
rela+onships
39
shap place
e
sound +me
compel someone
rela+onships
40
41
meaning
meaning
42
43
DRIVE
drives, drove, driving, driven
meaning
shap place
e
sound time
relationships
45
DRIVE
drives, drove, driving, driven
meaning
shap place
e
sound +me
rela+onships
46
shap place
e
sound time
compel someone
relationships
47
48
49
• User selects their meaning on the source side • Smurfs and Ducks
(predisambiguation)
• Users can suggest missing senses • Kam4D –
• SlowBrew suggests Party Terms (MWEs), or users kamu.si/kam4d
can mark their own
• Party Terms are treated as Smurfs in Kam4D • SlowBrew
• Separated expressions easily rejoined (unlike NMT)
62
63
Unanswered Questions:
• Will users take the time to
predisambiguate?
• People take time to choose images
• People take time to spellchick
• Syntax on the target side?
• Outside Kamusi wheelhouse – partners
needed
• How to pay for it?
66
p l aFo rm
A L A N -AU
fo r t h e AC a : n g data
d a ta core d i s s e m in
l i n g u is:c e r i n g an d a ge s)
as t h e g a t h La n g u
l s e r ve st e m s for B o rd e r
m 4 D w il a m u s i sy a r C ro s s-
• K a w ith K e h icu l
ra te d BL s ( V
• Integ cus on 20 VC
I n i : a l fo
•
68