DBpedia RDF2Vec Graph Embeddings
Description
DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].
The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).
Generating Embeddings
The code for generating these embeddings can be found here.
Run the run.sh script that wraps all the necessary commmands to generate embeddings
bash run.sh
The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt
. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.
A folder files
is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia
is created containing the embeddings in vectors.txt
along a set of random walk files.
Run Time of Embeddings Generation
Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.
- Total: 1 day, 8 hours, 52 minutes, 41 seconds
- Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds
- Training: 1 day, 1 hour, 28 minutes, 5 seconds
Parameters Used
Here is listed the parameters used to generate the embeddings provided here:
- Number of walks per entity: 100
- Depth (hops) per walk: 4
- Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE
- Threads: # of processors / 2
- Training mode: sg
- Embeddings vector dimension: 200
- Minimum word2vec word count: 1
- Sample rate: 0.0
- Training window size: 5
- Training epochs: 5
Files
embeddings.zip
Files
(32.5 GB)
Name | Size | Download all |
---|---|---|
md5:dd76884a3b27fea8c2f82a333e14d71d
|
32.5 GB | Preview Download |
Additional details
References
- Portisch, J., Hladik, M. and Paulheim, H., 2020. RDF2Vec Light--A Lightweight Approach for Knowledge Graph Embeddings. arXiv preprint arXiv:2009.07659.