The code for the paper Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders on ACL2018
Cite this paper:
@inproceedings{wang2018learning,
title={Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders},
author={Wang, Yansen and Liu, Chenyi and Huang, Minlie and Nie, Liqiang},
booktitle={ACL},
year={2018}
}
You can download our data here. Press ctrl+f and search for "Learning to Ask Questions in Open-domain Conversational Systems with Typed Decoders" and you can find the link to our dataset.
IMPORTANT NOTE: Our code is not compatible with new versions of tensorflow, so please use tensorflow-1.0.0 to run our codes.
Command python main.py {--[option1]=[value1] --[option2]=[value2] ... }
Options(=[default_value]):
--is_train=True
Set to True for training and False from inference.--symbols=20000
Size of vocabulary.--embed_units=100
Size of word embedding.--units=512
Size of each model layer.--layers=4
Number of layers in the model.--batch_size=50
Batch size to use during training.--data_dir=./data
Data directory.--train_dir=./train
Training directory.--per_checkpoint=1000
How many steps to do per checkpoint.--check_version=0
The version for continuing training or for inferencing. Set to 0 if you don't want to continue from an existed checkpoint.--log_parameters=True
Set to True to show the parameters.--inference_path=""
Set filename of inference, empty for screen input.--PMI_path=./PMI
PMI directory.--keywords_per_sentence=20
How many keywords will be included. We don't need to set this flag in STD.--question_data=True
(Deprecated, please set to True) An unused option in the final version.
The file train.sh
and infer.sh
contain example commands for training and inference. You can use them with the sh
command.
Command python main.py {--[option1]=[value1] --[option2]=[value2] ... }
Options(=[default_value]):
--is_train=True
Set to True for training and False from inference.--symbols=20000
Size of vocabulary.--embed_units=100
Size of word embedding.--units=512
Size of each model layer.--layers=4
Number of layers in the model.--batch_size=50
Batch size to use during training. Please set to 1 during inference or the PMI mechanism can't work properly.--data_dir=./data
Data directory.--train_dir=./train
Training directory.--per_checkpoint=1000
How many steps to do per checkpoint.--check_version=0
The version for continuing training or for inferencing.--log_parameters=True
Set to True to show the parameters.--inference_path=""
Set filename of inference, empty for screen input.--PMI_path=./PMI
PMI directory.--keywords_per_sentence=20
How many keywords will be included.--question_data=True
(Deprecated, please set to True) An unused option in the final version.
The file train.sh
and infer.sh
contain example commands for training and inference. You can use them with the sh
command.
We're sorry that due to our regulations, we can't share the word vectors pretrained. You can make your own "vector.txt" in this format:
[word1] 1.0 -2.0 5.0
[word2] 3.14 2.72 -1.41
[word3] 0.86 -1.71 -0.04
... ...
and set --embed_units==[vector dimension]
. In this case, you should set --embed_units==3
Here's part of our word vectors:
冉津 -0.007428 -0.018109 0.017502 0.127934 0.090787 -0.008699 -0.181448 -0.117719 -0.130669 0.007109 -0.048784 -0.083871 -0.041926 -0.016476 0.026685 -0.094259 -0.097639 0.049795 0.077781 -0.027308 -0.000205 0.117830 -0.033821 -0.088984 0.150127 -0.065157 0.018675 -0.105137 0.001134 -0.026754 0.026742 -0.127951 -0.006684 -0.080394 0.003453 -0.031691 -0.013896 0.051936 0.034658 0.079686 0.026027 0.130313 0.011976 -0.154662 -0.065610 0.079444 -0.036182 -0.042820 0.040647 -0.009277 -0.094344 0.352311 -0.100773 -0.167505 -0.071562 0.182705 0.087977 -0.077308 0.121469 -0.076466 0.045806 0.029080 -0.120310 0.112574 0.027545 0.130245 0.060847 -0.087550 -0.072264 -0.061106 0.045996 -0.048654 0.036791 -0.324380 -0.129975 -0.151802 0.055080 0.108745 0.072554 0.063584 -0.183879 -0.088556 -0.189840 -0.028041 -0.130920 -0.110319 -0.043854 -0.124681 0.027615 -0.096786 0.024738 -0.112449 -0.041501 -0.016814 -0.026927 0.213262 0.127977 -0.085883 -0.056919 0.074451
冉徽 0.014493 -0.009604 -0.056103 0.137076 0.136810 0.003288 -0.162282 -0.142987 -0.111230 -0.007172 -0.036456 -0.059875 -0.034977 -0.000799 0.010098 -0.087427 -0.089052 0.052306 0.095106 -0.078993 -0.038151 0.072410 -0.069268 -0.057892 0.117272 -0.029470 0.013380 -0.051824 -0.039586 -0.041293 0.059040 -0.148370 -0.015987 -0.074139 0.048661 -0.056333 0.022390 0.077231 -0.010541 0.071275 0.015923 0.151031 0.013858 -0.166912 -0.053901 0.057671 -0.070033 -0.044730 0.011594 0.016944 -0.148096 0.327251 -0.109722 -0.195073 -0.074526 0.209270 0.096594 -0.008418 0.120976 -0.057380 0.039540 0.050772 -0.150347 0.127315 -0.023129 0.164845 0.086893 -0.053719 -0.042148 -0.030370 0.064161 -0.070620 0.031359 -0.297059 -0.092481 -0.101616 0.105090 0.139352 0.058642 0.080823 -0.226540 -0.081144 -0.161620 -0.055791 -0.109781 -0.082259 -0.023754 -0.115139 0.023207 -0.117227 0.025099 -0.098476 -0.039537 0.056101 0.011074 0.201935 0.127134 -0.081476 -0.025416 0.024106
冉红平 0.001560 -0.005889 0.025941 0.063590 0.079942 0.007259 -0.176020 -0.105751 -0.107272 0.005988 -0.078503 -0.030769 -0.029349 -0.039878 0.007160 -0.075574 -0.121881 0.030458 0.070573 -0.030429 -0.009549 0.063056 -0.024280 -0.122451 0.073607 -0.017913 0.002592 -0.099109 0.039369 -0.054562 0.044947 -0.135777 -0.023722 -0.065398 0.039630 -0.058899 -0.034931 0.051255 0.051398 0.016336 -0.003559 0.133971 0.088922 -0.220131 0.006107 0.022170 -0.056472 -0.061360 0.019423 0.018444 -0.161037 0.362732 -0.118108 -0.157995 -0.071416 0.118341 0.083489 -0.036985 0.103561 -0.086170 0.029961 0.045517 -0.165905 0.122532 0.004158 0.116590 0.024232 -0.052038 -0.053199 -0.038042 0.089462 -0.087992 0.033044 -0.303940 -0.160299 -0.150656 0.062613 0.156578 0.015454 0.124571 -0.198247 -0.060708 -0.183756 -0.080255 -0.135093 -0.155833 0.029361 -0.091097 0.032860 -0.103119 0.099081 -0.114630 -0.045474 0.023771 -0.044274 0.185795 0.088140 -0.072055 -0.031876 0.074563
......
If you do not have pretrained vectors, just leave a blank vector.txt
file for the program to load. The program will automatically initialize all the word vectors for those not appeared in this file.