Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lang embeddings loading #87

Open
voiteshonok opened this issue Mar 1, 2023 · 0 comments
Open

Lang embeddings loading #87

voiteshonok opened this issue Mar 1, 2023 · 0 comments

Comments

@voiteshonok
Copy link

The command is !python -m codegen_sources.preprocessing.preprocess data/test_dataset/ --langs cpp java python --mode=monolingual --local=True --fastbpe_vocab_path=/content/CodeGen/data/bpe/cpp-java-python/vocab --fastbpe_code_path=/content/CodeGen/data/bpe/cpp-java-python/codes --bpe_mode=fast --train_splits=1 --percent_test_valid=10
When you train Transcoder from your previous checkpoint you got such lines:
INFO - 03/01/23 08:40:48 - 0:00:09 - ============ Model Reloading
INFO - 03/01/23 08:40:48 - 0:00:09 - Reloading encoder from /content/drive/MyDrive/transcoder/transcoder/l2hpmxrljh/checkpoint.pth ...
WARNING - 03/01/23 08:41:13 - 0:00:33 - No match found for lang cpp_sa cpp in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:13 - 0:00:33 - No match found for lang java_sa java in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:13 - 0:00:33 - No match found for lang python_sa python in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
INFO - 03/01/23 08:41:13 - 0:00:33 - Reloading decoders from /content/drive/MyDrive/transcoder/transcoder/l2hpmxrljh/checkpoint.pth ...
WARNING - 03/01/23 08:41:28 - 0:00:49 - No match found for lang cpp_sa cpp in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:28 - 0:00:49 - No match found for lang java_sa java in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
WARNING - 03/01/23 08:41:28 - 0:00:49 - No match found for lang python_sa python in dict_keys(['cpp_sa', 'java_sa', 'python_sa']). Initializing randomly.
I guess it is not a desirable behavior, that the consequence of

for lang in [l for i, l in sorted(params.id2lang.items())]:

if lang in lang_mapping:
    lang_ = lang_mapping[lang]
else:
    lang_ = lang

simple lang_ = lang lets reuse previous embeddings or smth is wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant