DeepSeek Identifies As Open AI When Questioned By Users

According to a Reddit forum of AI enthusiasts, DeepSeek, when questioned, refers to itself as a model developed by OpenAI, a similar response given by ChatGPT.

This is the post often cited amongst users pointing out that Open AI data was used as basis for R1

This underscores the overall point that OpenAI’s curated data and refined output were key to the development of DeepSeek’s Large Language Model, R1.

OpenAI has really driven home that China’s new open source platform, DeepSeek, is using a training technique known as “distillation”. This technique involves training a smaller model on the output of a larger model, creating a simplified copy without the need for the original’s extensive and costly data. The point really drove down the stock of NVIDIA since they make the market leading GPU’s that are essential for LLM training and development.

Market Impact

The notion that DeepSeek ‘distilled’ OpenAI’s data for its training has stirred the financial markets because of the possibility of accessing OpenAI like performance without the need for costly infrastructure.

However, this may not be a particularly sustainable approach for organizations that need tighter constraints on hallucinations. The fact that the weights are public, but the training data is somewhat bootstrapped and hacked, adds to that performance concern. Still, the power now available to solo practitioners and smaller organizations is certainly potentiating new developments in applications.

In a nod to the achievement, Microsoft has announced that the DeepSeek R1 model will be made available on the Azure AI platform and GitHub.In addition, Microsoft plans to create a smaller, distilled version of the R1 model for use with Windows 11 Copilot+ PCs.

In spite of the controversy, DeepSeek continues to pique interest from users and tech companies. AMD, for example, has jumped on the bandwagon, encouraging users to operate DeepSeek R1 Distilled Models on their Ryzen™ AI and Radeon™ Graphics Cards. Despite the impressive capabilities, some users have raised concern regarding the functionality of the distilled models.

It's like this: There's China's DeepSeek AI, which is garbage AI chat being overhyped. There's AMD's DeepSeek R1, the more tangible, verifiable application of DeepSeek model for graphic performance and AI advancement. Last the Open-Source Models that the others derive from has widespread adoption

— Momala (@likeaking.bsky.social) January 29, 2025 at 8:34 PM

Several have reported slow processing times, with one user referring to the distilled model as ‘dumb’. Interestingly, the model weights are open source and as such, can be safely executed locally on a personal computer.

Irony In Training Data

The question of data rights has also arisen, with OpenAI facing criticism for its own use of scraped data without acquiring rights from the website owners. Many believe that the purported distillation of OpenAI’s data by DeepSeek is a form of poetic justice. The controversy surrounding data usage in AI underscores the need for careful deliberation regarding data rights within the industry, highlighting the fine line between advancement and infringement. Regardless, OpenAI’s complaint marks a significant moment in AI development and how data rights are perceived.