Diffusion-based valuable NFT generation
Multimedia Tools and Applications
(2026) 85:557
https://doi.org/10.1007/s11042-026-21724-6
Diffusion-based valuable NFT generation
Emir Ulurak1 · Beyza Kaya1 · Emre Sefer1
Received: 19 November 2024 / Revised: 29 January 2026 / Accepted: 1 June 2026
© The Author(s) 2026
Abstract
Non-fungible tokens (NFTs) have revolutionized digital ownership, offering unique provenance and value to digital assets. Existing text-to-image models do not have the incentive
mechanisms to generate statistically rare features, even when they optimize for visual
fidelity. This paper introduces DiffNFTGen, a new generative framework that is the first
to combine a customized RarityReward measure derived from a Vision Transformer (ViT)
with reinforcement learning. The suggested method ensures fidelity to NFT styles while
explicitly maximizing the generation of rare features by fine-tuning Stable Diffusion using
Proximal Policy Optimization (PPO) and Kullback-Leibler (KL) divergence regularization. DiNFTGen achieves a 2.4x greater rarity score than baseline models while keeping
competitive visual quality, according to quantitative evaluation utilizing Frédechet Inception Distance (FID) and Rarity Score. In order to examine the trade-off between fidelity
and rarity, we also perform ablation studies regarding reward weighting. The model’s
capacity to generalize NFT styles to new domains is confirmed by qualitative evaluations.
The datasets, analysis code, and suggested approach are accessible on https://github.com
/seferlab/diffnftgen.
Keywords NFTs · Diffusion models · Reinforcement learning · Text to image generation
Emir Ulurak and Beyza Kaya contributed equally to this work.
Emre Sefer
Emir Ulurak
Beyza Kaya
1
Artificial Intelligence and Data Engineering Department, Ozyegin University, Orman Sokak,
Cekmekoy 34794, Istanbul, Turkey
557
Page 2 of 23
Multimedia Tools and Applications
(2026) 85:557
1 Introduction
Non-fungible tokens (NFTs) are blockchain-based tokens that may represent a wide range
of assets such as art pieces, digital content, video, media, etc. [1]. Creating Non-Fungible
Token (NFT) images has gained tremendous popularity in recent years, because of their
visual uniqueness, attractiveness, and richness of various gorgeous elements. NFT certificates represent irreversible and unique authenticity and ownership certificates. Even though
NFTs can define ownership of different digital items, they have mainly been adapted to
images and digital art. NFTs have been minted in terms of collections, which have been
bought and sold with auctions [2, 3]. It is reported that the NFT market is expected to significantly grow at an annual rate of 35.0%, reaching $13.6 billion by 2027. From more of a
financial perspective, NFTs can also be seen as an alternative investment class [4]. Although
humans have generated many profitable collections, their systematic generation through
artificial intelligence remains limited.
Even though art-style NFT images are quite popular, designing such artistic-style images
is a demanding task since art will generate something unique as well as appealing visually. Human artists should take into account a number of elements while designing, such
as unique generation, rarity, and artistic and aesthetic appeal. These elements are required
for NFTs to appear valuable in the competitive NFT market. Additionally, since there is an
increasing demand for NFT images, the need for innovation and unique designs is increasing, which is important to catch the NFT market’s attention. In this case, NFT markets show
the demand from potential NFT collectors and buyers.
AI-based image generation has been extensively studied, where promising results
have been obtained via Variational AutoEncoders [5, 6], generative adversarial networks (GANs) [7–9], and more recent diffusion-based models [10–15]. Among them, Stable
Diffusion [16] has achieved the best performance where diffusion is performed in a latent
space. Stable Diffusion not only has a lower computational cost, but its visual performance
is remarkable.
In this paper, we introduce DiffNFTGen, a novel generative deep reinforcement learningbased model, which is designed to create high-quality NFT images with a focus on rarity.
Building upon the Stable Diffusion framework [16], DiffNFTGen combines reinforcement
learning with a custom RarityReward metric created using the Vision Transformer (ViT)
model [17] to obtain the rarity values of NFTs. DiffNFTGen employs Proximal Policy Optimization (PPO) [18] to address complexity of the reinforcement learning model. By limiting the scope of policy updates, PPO (PPO) [18] enables more gradual and stable learning.
Additionally, the approach incorporates Kullback-Leibler (KL) divergence [19] regularization to ensure that the generated images adhere closely to the original NFT styles while also
promoting their uniqueness and value. DiffNFTGen is designed to generate NFT images
in response to user prompts, with a specific focus on integrating features that enhance the
perceived value of these NFTs. In this case, our goal is to generate rare NFTs since the rarity
of an NFT has been identified as a crucial determinant of its market value among the various attributes.
We compare DiffNFTGen with other alternative diffusion models [20] which has limited capacity for our NFT generation’s comprehensive needs. Relative to the DPOK [21]
framework, which combines reinforcement learning techniques with a diffusion model,
13
Multimedia Tools and Applications
(2026) 85:557
Page 3 of 23
557
alternative diffusion-based approaches do not fully align with our dataset and the unique
requirements of the rarity feature we aim to incorporate.
Once we transform the rarity feature into a quantitative evaluation metric, we facilitate
its integration into DiffNFTGen. As part of experimenting with various metrics to ensure
that the generated images maintained high quality without significant fidelity loss, we found
that the specificity of prompts plays a critical role in the quality of the generated images.
Moreover, detailed prompts and real NFT references significantly enhanced the output,
highlighting the importance of prompt engineering in our generative process. Overall, these
advancements position DiffNFTGen as a significant contribution to the field of NFT generation, offering a model that meets market demands and adheres to high aesthetic standards.
1.1 Contributions
To summarize, the main contributions of DiffNFTGen are as follows:
● Objective Definition of Generative Value: We propose a formalized definition of "value" for generated NFTs based on statistical rarity rather than speculative market pricing.
By calculating attribute rarity weights derived from collection-level trait distributions,
we establish a quantitative ground truth for uniqueness. Instead of depending only on
subjective aesthetic quality, this enables the generating process to optimize for quantitative distinctiveness.
● (...truncated)