The Pre-trained Language Model What Can GTP-3, Switch Transformer, and Enlightenment 2.0 Do?

5 min readOct 25, 2021

In just two years, the parameter scale of the pre-trained language model has increased tenfold, and competition in the 21st century is indeed everywhere. What is a pre-trained language model? What can they do?

In May 2020, OpenAI released the pre-trained model GPT-3 with 175 billion parameters. It can not only write articles, answer questions, and translate, but also have the ability to have multiple rounds of dialogue, code typing, and mathematical calculations. As one of the “star” models in the field of artificial intelligence in 2020, GPT-3 has pushed the popularity of ultra-large-scale pre-training models to a new high.

In January 2021, less than a year after the advent of GPT-3, Google launched the Switch Transformer model, which directly increased the amount of parameters from 175 billion GPT-3 to 1.6 trillion, making it the first trillion-level language in human history. Model.

On June 1, 2021, less than half a year after Switch Transformer came out, the Beijing Zhiyuan Conference kicked off as scheduled at the Conference Center of Zhongguancun National Independent Innovation Demonstration Zone. At the opening ceremony, Tsinghua University professor and academic deputy dean of Zhiyuan Research Institute Tang Jie The super large-scale intelligent model “Enlightenment 2.0” was released. The parameter scale of the “Enlightenment 2.0” model reached 1.75 trillion, breaking the 1.6 trillion parameter record previously created by the Google SwitchTransformer pre-training model. “Enlightenment” is led by the academic vice dean of Zhiyuan Research Institute and Professor Tang Jie of Tsinghua University, and led by more than 100 AI experts from Peking University, Tsinghua University, National People’s University, Chinese Academy of Sciences, and many other companies such as Ali. The first super-large pre-training language model system.

In recent years, with the support of deep learning and big data, natural language processing technology has developed rapidly. The pre-trained language model brings natural language processing…


