CGGNet: Compiler-Guided Generation Network for Smart Contract Data Augmentation

SJ Hwang, S Ho Ju, YH Choi - IEEE Access, 2024 - ieeexplore.ieee.org
SJ Hwang, S Ho Ju, YH Choi
IEEE Access, 2024ieeexplore.ieee.org
The emergence of blockchain and smart contracts has revolutionized various industries by
enabling automated code execution. However, the development of smart contracts, rooted in
programming languages, inherits common challenges in traditional software development,
notably concerning efficiency, reliability, and security. The application of deep learning
techniques holds promise for addressing these challenges. However, a critical issue in
applying deep learning to smart contracts is the lack of extensive datasets, as smart …
The emergence of blockchain and smart contracts has revolutionized various industries by enabling automated code execution. However, the development of smart contracts, rooted in programming languages, inherits common challenges in traditional software development, notably concerning efficiency, reliability, and security. The application of deep learning techniques holds promise for addressing these challenges. However, a critical issue in applying deep learning to smart contracts is the lack of extensive datasets, as smart contracts have only recently emerged compared to traditional programming languages. To address this problem, we propose a novel approach called Compiler-Guided Generation Networks (CGGNet) for augmenting smart contract datasets. In contrast to existing methods, CGGNet utilizes a compiler as an oracle in generative networks, ensuring the augmentation of a valid smart contract. By incorporating the Montecarlo tree search, CGGNet significantly enhances the diversity and validity of the generated contracts, overcoming the limitations posed by GAN-based models in code augmentations. To the best of our knowledge, this is the first study on code augmentation targeting smart contracts. Our experiments show that millions of unique and valid smart contracts can be augmented from thousands of valid smart contracts, and augmented datasets can mitigate the underfitting problem in practical deep learning applications.
ieeexplore.ieee.org