Member-only story
The full name of GPT is Generative Pre-Trained Transformer. From the name, this is a set of artificial intelligence generative network models. Generating the network means that this model is used to generate new content, which can be text, pictures, music, or a program, a function, or the result of data analysis. GPT-3 is an artificial intelligence model for language processing developed by OpenAI, which was founded in San Francisco in 2015.
The first GPT was released in 2018 and contains 117 million parameters. These parameters are the weight of the connection between network nodes and a good proxy for the complexity of the model. The GPT-2 released in 2019 contains 1.5 billion parameters. However, by comparison, GPT-3 has 175 billion parameters-100 times more than its predecessor and 10 times more than similar programs.
01 GPT-3 model
The GPT generation is relatively small compared to the training data set. It has only been trained on a few thousand books and an 8 GPU machine. GPT-2 greatly expands the scope of training. It captures posts with more than 3 stars on Reddit, a famous foreign forum, as training data, so that the training data of GPT-2 is at least 10 times more than the previous version. Although this data set is relatively limited, it has already had amazing results…