There is a newer version of the record available.

Published March 22, 2022 | Version 1.0.0
Dataset Open

DBpedia RDF2Vec Graph Embeddings

  • 1. Aalborg University

Description

DBpedia graph embeddings using RDF2Vec. RDF2Vec embedding generation code can be found here and is based on a publication by Portisch et al. [1].

The embeddings dataset consists of 200-dimensional vectors of DBpedia entities (from 1/9/2021).

 

Generating Embeddings

The code for generating these embeddings can be found here.

Run the run.sh script that wraps all the necessary commmands to generate embeddings

bash run.sh

The script downloads a set of DBpedia files, which are listed in dbpedia_files.txt. It then builds a Docker image and runs a container of that image that generates the embeddings for the DBpedia graph defined by the DBpedia files.

A folder files is created containing all the downloaded DBpedia files, and a folder embeddings/dbpedia is created containing the embeddings in vectors.txt along a set of random walk files.

 

Run Time of Embeddings Generation

Generating embeddings can take more than a day, but it depends on the number of DBpedia files chosen to be downloaded. Following are some basic run time statistics when embeddings are generated on a 64 GB RAM, 8 cores (AMD EPYC), 1 TB SSD, 1996.221 MHz machine.

  • Total: 1 day, 8 hours, 52 minutes, 41 seconds
  • Walk generation: 0 days, 7 minutes, 24 minutes, 36 seconds
  • Training: 1 day, 1 hour, 28 minutes, 5 seconds

 

Parameters Used

Here is listed the parameters used to generate the embeddings provided here:

  • Number of walks per entity: 100
  • Depth (hops) per walk: 4
  • Walk generation mode: RANDOM_WALKS_DUPLICATE_FREE
  • Threads: # of processors / 2
  • Training mode: sg
  • Embeddings vector dimension: 200
  • Minimum word2vec word count: 1
  • Sample rate: 0.0
  • Training window size: 5
  • Training epochs: 5

Files

embeddings.zip

Files (32.5 GB)

Name Size Download all
md5:dd76884a3b27fea8c2f82a333e14d71d
32.5 GB Preview Download

Additional details

References

  • Portisch, J., Hladik, M. and Paulheim, H., 2020. RDF2Vec Light--A Lightweight Approach for Knowledge Graph Embeddings. arXiv preprint arXiv:2009.07659.