ASI's Journey from GPT-2 to Artificial Superintelligence

With the help of Claude2, I read a paper from OpenAI titled "Weak-to-Strong Generalization: Eliciting Strong Capabilities with Weak Supervision." The link is https://cdn.openai.com/papers/weak-to-strong-generalization.pdf

This paper provides an important perspective on the development of artificial superintelligence (ASI): the concept of "weak-to-strong learning," where weaker AI models supervise more powerful ones. For example, whether humans can train smarter intelligences in the future.

The paper uses simple methods such as fine-tuning GPT-4 with GPT-2 level supervisors, which significantly improves generalization capabilities and recovers performance close to GPT-3.5 levels in NLP tasks.

The main findings include:

: Strong models trained using weak model labels outperform the weak models but still fall short compared to strong models trained directly with real labels.
: Methods like auxiliary confidence loss and guided learning can significantly improve weak-to-strong generalization. For instance, these methods can recover about 80% of the performance gap between weak and strong models in NLP tasks.
: Strong models may overfit the errors in weak supervision, and larger models find it harder to mimic the weaknesses of smaller models.

Researchers conducted tests on NLP, chess, and reward modeling tasks using pre-trained language models from the GPT-4 series. The results show that when powerful pre-trained models are naively fine-tuned on labels generated by weak models, their performance consistently surpasses that of the weak supervisor. This phenomenon is referred to as "weak-to-strong generalization."

However, this method does not fully unleash the full potential of powerful models. This suggests that current alignment techniques, such as reinforcement learning with human feedback (RLHF), may be difficult to apply to superhuman-level models unless further improvements are made. Human supervision alone seems insufficient for training advanced ASI systems. Despite demonstrating weak-to-strong generalization capabilities, powerful AI models still have significant gaps compared to their full potential.