The Collaborative Development of Web3 Technology and AI: Unlocking New Opportunities for Computing Power Resources and Data Value

2025-07-28 20:38:11

AI+Web3: Towers and Squares

TL;DR

Web3 projects with AI concepts have become targets for capital attraction in both primary and secondary markets.
The opportunities of Web3 in the AI industry are reflected in: using distributed incentives to coordinate potential supply in the long tail ------ across data, storage, and computation; at the same time, establishing an open-source model and a decentralized market for AI agents.
The main application of AI in the Web3 industry is on-chain finance (crypto payments, trading, data analysis) and assisting development.
The utility of AI+Web3 is reflected in the complementarity of the two: Web3 is expected to counteract AI centralization, while AI is expected to help Web3 break its boundaries.

Introduction

In the past two years, the development of AI has been like pressing the fast-forward button. The butterfly wings stirred by ChatGPT have not only opened up a new world of generative artificial intelligence but have also stirred up a tide in Web3 on the other side.

With the support of AI concepts, the financing boost in the slowing cryptocurrency market is evident. Media statistics show that in the first half of 2024, a total of 64 Web3+AI projects completed financing, with the AI-based operating system Zyber365 achieving a maximum financing amount of 100 million dollars in its Series A.

The secondary market is more prosperous, with data from a certain website showing that in just over a year, the total market value of the AI sector has reached 48.5 billion dollars, with a 24-hour trading volume close to 8.6 billion dollars; the benefits brought by advancements in mainstream AI technology are evident, with the release of a certain company's Sora text-to-video model leading to an average price increase of 151% in the AI sector; the AI effect has also radiated to one of the cryptocurrency fundraising sectors, Meme: the first AI Agent concept MemeCoin------GOAT has quickly gained popularity and achieved a valuation of 1.4 billion dollars, successfully sparking an AI Meme craze.

Research and discussions on AI+Web3 are equally heated, ranging from AI+Depin to AI Memecoin and now to the current AI Agent and AI DAO, with FOMO sentiment unable to keep up with the speed of the new narrative rotation.

AI+Web3, this term combination filled with hot money, trends, and future fantasies, is inevitably seen as a marriage arranged by capital, making it difficult for us to discern whether beneath this glamorous robe lies the playground of speculators or the eve of a dawn explosion?

To answer this question, a crucial consideration for both parties is whether it would be better with the other involved. Can one benefit from the other's model? In this article, we also try to stand on the shoulders of giants to examine this pattern: how Web3 can play a role in various aspects of the AI technology stack, and what new vitality AI can bring to Web3?

Part.1 What Opportunities Does Web3 Have Under the AI Stack?

Before we delve into this topic, we need to understand the technical stack of AI large models:

In simpler terms, the entire process can be described as follows: the "large model" is like the human brain. In its early stages, this brain belongs to a newborn baby who has just arrived in the world, needing to observe and absorb massive amounts of information from its surroundings to understand this world. This is the "collection" phase of data. Since computers do not possess human senses such as vision and hearing, before training, large-scale unlabelled information from the outside world needs to be converted into a format that computers can understand and use through "preprocessing."

After inputting data, the AI constructs a model with understanding and predictive abilities through "training", which can be seen as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities that the baby continuously adjusts during the learning process. When the content of learning begins to specialize, or feedback is received from communication with others and corrections are made, it enters the "fine-tuning" stage of the large model.

As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "inference" of large AI models, where the model can predict and analyze new language and text inputs. Infants express feelings, describe objects, and solve various problems through language skills, which is also similar to how large AI models apply inference to various specific tasks, such as image classification and speech recognition, after completing training and being put into use.

The AI Agent is closer to the next form of large models------able to independently execute tasks and pursue complex goals, possessing not only thinking abilities but also the capacity for memory, planning, and interacting with the world using tools.

Currently, in response to the pain points of AI across various stacks, Web3 has preliminarily formed a multi-layered, interconnected ecosystem that encompasses all stages of the AI model process.

1. Basic Layer: The Airbnb of Computing Power and Data

Hash Rate

Currently, one of the highest costs of AI is the computing power and energy required to train and infer models.

An example is that a certain model of a certain company requires 16,000 units of a certain model GPU (which is a top-tier graphics processing unit designed specifically for artificial intelligence and high-performance computing workloads) to complete training in 30 days. The unit price of the latter's 80GB version is between $30,000 and $40,000, which requires an investment of $400 to $700 million in computing hardware (GPU + network chips). Meanwhile, the monthly training consumes 1.6 billion kilowatt-hours, and the energy expenditure is nearly $20 million per month.

The release of AI computing power is precisely the earliest intersection of Web3 and AI------DePin (Decentralized Physical Infrastructure Network). Currently, a data website has listed over 1,400 projects, among which representative projects of GPU computing power sharing include io.net, Aethir, Akash, Render Network, and so on.

The main logic is that the platform allows individuals or entities with idle GPU resources to contribute their computing power in a permissionless decentralized manner, through an online marketplace for buyers and sellers similar to Uber or Airbnb, thereby increasing the utilization of underutilized GPU resources, and end users can obtain more cost-effective efficient computing resources; at the same time, the staking mechanism also ensures that if there are violations of the quality control mechanism or network disruptions, resource providers will face corresponding penalties.

Its characteristics are:

Gather idle GPU resources: The suppliers are mainly independent small and medium-sized data centers, excess computing power resources from operators such as crypto mines, and mining hardware with PoS consensus mechanisms, such as FileCoin and ETH miners. Currently, there are also projects dedicated to launching devices with lower entry thresholds, such as exolab, which utilizes local devices like MacBook, iPhone, iPad, etc., to establish a computing power network for running large model inference.
Facing the long-tail market of AI computing power:

a. "In terms of technology, a decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities brought by super-large cluster scale GPUs, while inference has relatively lower computational performance requirements for GPUs, such as Aethir focusing on low-latency rendering work and AI inference applications."

b. "From the demand side perspective," small and medium computing power demanders will not train their own large models separately, but will instead choose to optimize and fine-tune around a few leading large models, and these scenarios are naturally suitable for distributed idle computing power resources.

Decentralized Ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments based on demand while also generating revenue.

Data

Data is the foundation of AI. Without data, computation is as useless as floating duckweed at the end of a stream, and the relationship between data and models is like the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, comprehension ability, and even values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

Data Hunger: AI model training relies on a large amount of data input. Public data shows that a certain company trained a model with a parameter count reaching the trillion level.
Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, professionalism of vertical data, and the incorporation of emerging data sources such as social media sentiment have raised new requirements for its quality.
Privacy and compliance issues: Currently, various countries and enterprises are gradually recognizing the importance of high-quality datasets and are imposing restrictions on dataset scraping.
High data processing costs: Large data volumes and complex processing. Public data shows that more than 30% of AI companies' R&D costs are allocated to basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The availability of freely provided real-world data is rapidly depleting, and AI companies' spending on data is increasing year by year. However, this spending has not benefitted the true contributors of the data, as platforms fully enjoy the value creation brought by the data, such as a certain platform that achieved a total revenue of 203 million dollars through data licensing agreements with AI companies.

The vision of Web3 is to allow users who truly contribute to participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through a distributed network and incentive mechanisms.

Grass is a decentralized data layer and network that allows users to run Grass nodes, contribute idle bandwidth and relay traffic to capture real-time data from across the internet, and earn token rewards;
Vana introduces a unique Data Liquidity Pool (DLP) concept, allowing users to upload their private data (such as shopping records, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize the use of this data by specific third parties;
In PublicAI, users can use #AI 或#Web3 as a classification tag on a certain platform and @PublicAI to achieve data collection.

Data Preprocessing: In the data processing of AI, the collected data is often noisy and contains errors, so it must be cleaned and converted into a usable format before training the model, involving standardization, filtering, and handling missing values in repetitive tasks. This stage is one of the few manual steps in the AI industry, giving rise to the profession of data annotators. As the model's requirements for data quality increase, the threshold for data annotators also rises, and this task is naturally suited for the decentralized incentive mechanism of Web3.

Currently, Grass and OpenLayer are both considering adding data labeling as a key part.
Synesis introduced the concept of "Train2earn," emphasizing data quality, where users can earn rewards by providing labeled data, annotations, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technologies and potential application scenarios are reflected in two aspects: (1) Training on sensitive data; (2) Data collaboration: multiple data owners can participate in AI training together without sharing their raw data.

Current common privacy technologies in Web3 include:

Trusted Execution Environment ( TEE ), such as Super Protocol;
Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io or Inco Network;
Zero-knowledge technology (zk), such as the Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to securely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, and most projects are still in exploration. One current dilemma is that the computing costs are too high, some examples include:

The zkML framework EZKL takes approximately 80 minutes.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

7 Likes