"Principle of AI Development: Application-Driven"
1. AI Applications Drive the Next Shuffle: Beyond the AI Development Lull, Advancing AI Towards Practicality
The field of artificial intelligence is entering its first "lull" since the wave of large models began.
On one hand, OpenAI's next-generation language model, Orin, has encountered significant bottlenecks, with results not meeting internal expectations. The release date for Sora, its multimodal model promoted at the beginning of 2024, has also been continuously delayed.
On the other hand, the "Scaling Law" of the AI industry seems to be failing, with the performance of large models no longer seeing exponential improvements with the increase in the number of parameters, data volume, and computing resources.
AI needs a new driving force, and application-driven will become the first principle. Entering the "lull" is in line with the laws of industry development, just like many technological waves in history, after the initial high-expectation phase, bubbles are inevitably produced. When technology does not meet high expectations, the industry will enter a period of calm, waiting for new technologies and application changes to bring about a major shuffle.
2. Two Routes of Development, Both Domestic and International, Converging on AI Applications
Advertisement
Overseas AI development was once technology-driven, with achieving AGI (Artificial General Intelligence) being equivalent to achieving the ultimate commercial value of AI enterprises. ChatGPT has shown the industry the potential of technology-driven efforts aimed at AGI, but after two years of development, the industry has realized that AGI is still far away, and pure technology-driven does not yet have the objective conditions. Overseas AI giants have gradually adjusted their direction in 2024, beginning to focus on application development and industry integration as the current stage's priority. OpenAI has shifted towards profitable commercial operations, and Google, Microsoft, and others have started to focus on To C, enterprise, and developer community cooperation.
The characteristic of China's AI development is application-driven. In 2023, the ChatGPT wave sparked the "Hundred Models War," accumulating an early foundation for China's AI market. According to the "Global Digital Economy White Paper," as of November 2024, China has as many as 478 large models, accounting for about 36% of the global total. The Chinese market has invested a lot of resources in developing basic large models, which to some extent has led to an overall lag in the development of AI deep technologies. However, on the other hand, it has also increased the social recognition and acceptance of generative AI, thereby promoting individuals and enterprises to care about the fit between AI products and the market, laying the foundation for "overtaking on the bend" in application-driven AI development.
Native AI companies are the engines driving technological progress and industrialization in the industry. In November 2024, Frost & Sullivan released the "2024 Global AI Ecosystem Overview," in which the native AI giants include Google, Baidu, and OpenAI. It is traceable to achieve the "lull" through the technological and industrial innovation of native AI companies. For example, Google proposed the Transformer architecture in 2017, which has become the most critical technology to promote the evolution of pre-trained models into large models for landing. In 2023, OpenAI released ChatGPT, and the general large model entered the public eye for the first time, awakening the dormant AI track for many years. In 2024, the Baidu World Conference was held, advocating for "AI applications towards reality," and China's AI development officially entered the stage of overtaking on the bend.
3. Emphasize the "New Three Elements" of the AI Industry: Illusion Elimination, Development Acceleration, and Intelligent Agent Development
3.1 Large Models Lack Authenticity, Eliminating Intelligent Illusions is the Primary Task
The "illusion" phenomenon in large language models (LLMs) refers to the model-generated content that seems reasonable but actually contains factual errors; or the AI-generated text, images, and videos do not conform to human cognition intuitively and lack authenticity. "Illusions" mainly come in three categories: logical fallacies, fabricated facts, and data biases. They are usually caused by the model's lack of reasoning ability, algorithmic framework vulnerabilities, data compression, and data inconsistency.
The rise of multimodal large models has also led to illusion phenomena in image, audio, and video model categories. For example, OpenAI's Sora large model, although excellent in the field of video generation, also has issues such as generated videos violating physical laws and spatial-temporal relationship disorders. This is also one of the main reasons why the official version of Sora cannot be launched.
The industry's solution to the "illusion" problem is to use RAG technology (Retrieval-Augmented Generation), which combines retrieval and enhanced generation to reduce "negative samples" in learning data, thereby reducing the illusion phenomenon in large models. But this is only limited to the LLM field. In the multimodal field, Baidu has developed image-based iRAG (Image-based RAG) technology, combining the search engine's billion-level image resources with the capabilities of the Wenxin base model. Under reinforced learning generation, it outputs various real images, with overall effects far exceeding the traditional "text-to-image" native system, effectively eliminating the "AI flavor." In the future, multimodal RAG represented by iRAG technology will become the main direction for the industry to alleviate AI "illusion" phenomena, providing more accurate capabilities for the development of more mature and realistic multimodal applications.
3.2 Programming AI Applications Accelerate the Industry Towards the Next Inflection Point
AI programming applications will accelerate the AI iteration process, endowing enterprises and individuals with stronger development capabilities. Global AI companies have entered the "efficiency era," and due to the high cost and long cycle of software application development, as well as the high cost and scarcity of algorithm engineers, developing auxiliary programming AI to improve developer efficiency and shorten development cycles has become a direction.
One is the use of AI within companies to directly assist in development tasks. For example, Amazon uses Amazon Q for internal software upgrade tasks, saving the company the equivalent of "4,500 developers' annual working hours," while also improving accuracy and security and reducing infrastructure costs.
Or collaborate with developer communities to develop programming AI assistants to improve individual code writing efficiency. For example, GitHub Copilot, launched by GitHub, OpenAI, and the Microsoft Azure team, can provide code suggestions and auto-completion functions, helping developers write code faster.
Even skipping the manual coding steps, with AI completing the entire development process. The "Miao Da" released at the 2024 Baidu World Conference is a milestone development tool, different from the previous two auxiliary programming tools, using Miao Da does not require any coding ability, software development can be achieved through natural language, everyone will have the ability of a programmer. Its implementation is by breaking down tasks into four steps: core requirements, content structure, engineering development, and data requirements, and then using multiple agents to collaborate and complete, which is also the most complex multi-agent collaboration tool to date. In the future, as the capabilities of base models improve and the technical capabilities of agents themselves iterate, Miao Da will be able to complete more complex development demands, achieving system-level development capabilities.
The step Baidu has taken in AI programming applications not only eliminates the threshold for becoming a developer in the future but also revolutionizes the business model of the AI era. This means that realizing business ideas no longer requires organizing personnel structures; agents can act as project managers, designers, and development engineers. With AI tools and creativity, individuals can develop products and create business models.
3.3 Intelligent Agents are the Evolution of LLMs, Tools that can Reflect, Evolve, and Utilize
Intelligent agents will become the new carriers of content, information, and services in the AI era and are the next inflection point. Large models are the initial form of AI applications, with a certain degree of universality but lacking in depth. The next step in the development of large models as applications is intelligent agents (AI Agents), which can not only perform task reasoning and resolution based on LLMs but also have memory, planning capabilities, and stronger self-learning abilities, and can use specific industry knowledge to complete more complex and higher-quality tasks.
The first category is embodied intelligent agents, with typical representatives being self-driving cars and general humanoid robots. As NVIDIA CEO Huang Renxun said, embodied intelligent agents are the super-intelligent form of AI entering the physical world. High-order intelligent driving above L3 is the first step, currently represented by Baidu Apollo in China and Tesla RoboCab in the United States. The second step is to use the widespread application of autonomous driving to accumulate data and experience in driving algorithms, and then build a virtual environment to train humanoid robot algorithms to help them achieve mass production and landing. For example, Tesla Optimus, a leader in the field of humanoid robots, uses its FSD intelligent driving algorithm.
The second category is platform intelligent agents, providing AI empowerment upgrades and customized solutions for tools, enterprises, roles, industries, and industries.
Tool intelligent agents have strong creativity and value in personal work and interest application scenarios and will become the new representative of productive forces in the AI industry. Typical representatives include AI writing, Canva design assistants, logo generators, etc. The core of tool intelligent agents lies in AI's "freedom" - they can generate content that meets user expectations by connecting the public domain (public data, search engine content, social platform content) and the private domain (personal data, audio, images, text), unlocking "creativity" and "uniqueness." For example, Baidu's Free Canvas released is a creativity tool powered by the Wenxin multimodal large model, with three major attributes of input freedom, editing freedom, and creation freedom, covering usage scenarios from creative painting, AI writing to professional reporting, helping users complete all tasks from finding materials, editing, to generation and sharing.
Enterprise intelligent agents are equivalent to the official AI of companies in the AI era, with a systematic capability of an official website + service. Due to traditional corporate official websites usually having complex information, difficult-to-understand professional jargon, and easy visual fatigue, complex search methods, and low service efficiency, they can no longer meet the changing needs of users. AI enterprise intelligent agents have the capabilities of customized recommendations, timely responses, and efficient services. In the automotive retail field, BYD's official website is a mature enterprise intelligent agent case, for users who do not understand jargon, the intelligent agent can find matching parameters for users like a human customer service, providing clear and concise comparisons with one-click, saving manual screening steps, and giving suggestions that meet user expectations.
Role intelligent agents, also known as AI digital humans, have their own backgrounds, settings, and knowledge bases. They can be AI roles based on real people or specific professions for online services, or they can be virtual roles. In the past, most virtual digital humans had problems such as mismatched voice and lip movements, mechanical body movements, and dull expressions. With the support of LLM and multimodal technologies, role intelligent agents can present more highly anthropomorphic expressions, expressions, and emotions. They can act as tutors, health consultants, online entertainment hosts, etc., providing knowledge and value through interaction with people. In fact, the current digital human live broadcast has exceeded the conversion rate of real people's live broadcast in many scenarios, with considerable commercial value.
Industry intelligent agents break information asymmetry for users and provide professional services in corresponding fields, with great potential in legal, healthcare, finance, sports, travel, and other fields. For example, the legal industry's intelligent agent - Fa Xing Bao is a professional legal assistant for ordinary people, from case analysis, article citation, compensation calculation, to litigation drafting, achieving full-process free services. Since its launch more than half a year ago, Baidu's Fa Xing Bao has provided efficient and reliable legal services for more than 9.4 million people.
Industrial intelligent agents provide solutions for various links in business decisions for companies with different divisions in different industries. For example, site selection and after-sales service are two major difficulties in the catering industry. The domestic catering leader, Yum China, chose to cooperate with Baidu, using large model capabilities for site selection evaluation, improving the site selection efficiency and sales results of thousands of stores. After the initial empowerment success, Yum China also carried out a digital upgrade of its entire business, with the peak daily calls of large models reaching millions, and the AI customer service problem-solving rate increased to 90%. Providing AI upgrades for the transformation of traditional industries is only the first step in the application exploration of industrial intelligent agents. In the future, as the model learning ability becomes deeper and the data training becomes larger, industrial intelligent agents even have the potential to become the core decision-makers of corporate strategy.
4. Standing in the AI Wave: Idealism, Focusing on Talent is the Engine of AI Progress
The original force of technological progress is idealism, and the earliest frontier technology waves are all driven by the idealism of a few individuals, which is particularly prominent in the AI field. The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton, on the one hand, to recognize the contributions of the two academic pioneers to AI theoretical research and industrialization; on the other hand, it is a tribute to the idealistic spirit of the two in the more than 30 years when the deep learning route was neglected, still firmly believing in this research direction.
Over the past 30 years, the development and formation of the US AI industry has been driven by the academic Hinton team and the industrial Microsoft and Google teams, and the commercialization of this achievement into a commercial wave has been achieved by OpenAI and NVIDIA. Behind this is the personal idealism of scholars represented by Hinton, LeCun, and Sutskever, and entrepreneurs represented by Musk,奥特曼, and Huang Renxun, driving industry progress.
The development and future trends of China's AI industry are also like this, and entrepreneurs with AI idealism are key to driving industry progress. As early as 2012, Li Yanhong aimed at the field of artificial intelligence, taking the lead in realizing that the turning point of AI development was coming, and initiated a plan to acquire the Hinton team that had just completed AlexNet. The team also included the later "Father of ChatGPT" Ilya Sutskever. At that time, Google, Microsoft, and DeepMind also participated in the auction, and Baidu has always bid the highest in the bidding, up to 44 million. Although due to Hinton's physical condition and other reasons, the final cooperation with Baidu was not possible, but on the stage of changing the artificial intelligence revolution, the world vision and perspective of Chinese companies made the AI academic community sigh. "AI Godfather", one of the initiators of deep learning Yang Lequn sighed: Baidu is one of the first large companies to deploy commercial deep learning systems, even ahead of Google and Microsoft.
Over the past decade, as an entrepreneur, Li Yanhong has focused Baidu's decade-long layout on the AI field on the one hand, and on the other hand, as an individual, he has continued to spread his idealism, whether to national leaders, entrepreneurs, media, or friends, students, and geeks, he never misses any opportunity to "preach" AI. In the consecutive 8 "Two Sessions", Li Yanhong proposed 13 AI-related proposals. In the persistence of a decade of "preaching" and industrial landing, more and more companies have also seen the value of AI technology and have begun to pay attention to AI investment, and the confidence of Internet technology companies to develop artificial intelligence has also become more firm.
The idealism of AI has gone from concept to landing, and the foundation is to focus on talent training. In 2013, Li Yanhong established the Baidu Deep Learning Laboratory and served as the dean. This is the world's first enterprise-level laboratory named "deep learning". Since then, the Chinese AI field has begun to recruit talents, stabilize the foundation, and embark on a 10-year exploration journey. In 2014, Wu Enchu, the leader of Google's cat, joined Baidu's deep learning as the chief scientist, responsible for the Baidu Brain project, and trained many core technical backbones of China's AI industry. In 2017, Wang Haifeng took over from Wu Enchu and built AIG (AI Technology Platform System), which later developed into the National Engineering Research Center for Deep Learning Technology and Application. In 2023, Wang Haifeng released Baidu's talent training Star River Plan, announcing a vision to train 5 million large model talents for society.
Over the past decade, Baidu has not only gradually formed a skeleton of AI technical talents from 0 to 1, stabilizing the basic plate of China's AI development, but also continuously expanding the ranks of AI idealists, preparing for the Chinese AI industry in the next transformative wave. In 2021, Li Yanhong mentioned in his letter to shareholders: "Baidu has determination and patience. Because we know that the most cutting-edge technology waves cannot be waited, you must layout 10 years, 20 years in advance." Looking at the present, not only has Baidu, which was laid out 10 years in advance, achieved ecological leadership, but China's AI industry has also stood in the global artificial intelligence wave.