Deep Seek Might Be In Deep Shit – Microsoft and OpenAI Investigate Whether API Access Was Abused To Make A Distilled Model

The recent breakthrough from Deep Seek was much lauded by the open source community. The closed sourced model from Open AI with its GPT4 iteration was lauded but opaque training data and methodology disallows researchers with limited budgets to join in the exploration of generative AI applications.

Yesterday, the excitement was so palpable that it raised doubts on whether all of the Data Center and GPU centric infrastructure was truly necessary. After all, the narrative that a Chinese Hedge Fund, High Flyer investment, could bootstrap an LLM and open source the end point was a huge gain. Their official communication was that the training process, a necessary step for creating an LLM, cost under 6 million dollars. To train an LLM at that cots flew in the face of many publicly traded companies CAPEX figures which many of them have sunk into NVIDIA’s ever deeper pockets. As a result, NVIDIA’s stuck sank to levels last seen in December 2023.

Subsequently, investors began piling back into NVIDIA enough for it to throttle the major and historical loss that they suffered on Monday. The stock closed at approximately 128.8 dollars per share, but the damage was done in terms of perception. The fact that the R1 LLM was trained with NVIDIA GPU’s did not matter since the narrative was hyper fixated on potential growth or usage of high end H100 and Blackwell chips.

Distilled Model Claims

However, the story changed just a few short hours ago when the well connected David Sacks, the appointed AI Czar for the Trump administration, appeared to confirm suspicions that the model was not all that it seems from an engineering standpoint. Sure, the weights and biases are surely valuable to the community of developers at the application layer. But, the engineering feat behind them was important as it could amount to a talent based way to circumvent the current hefty and expensive LLM architecture.

As a result, a new thesis emerges which just supports the market’s prior love and adulation of NVIDIA. Deep Seek’s stated figures could only be supported with IP Theft, well versed engineers and oodles of money to test and train. Perhaps, this current R1 iteration did just cost under 6 million, but they spent a heck of a lot elsewhere to get to that specific figure.

Scale AI’s CEO, Alexndr Wang, state that there is also the possibility that high end Nvidia chips were used for training. This paired with the distilled model which was derived from OpenAI without permission negates much of the engineering. In a sense, Nvidia’s industry position is reaffirmed.

Data Breach

For Microsoft and Open AI, the Deep Seek issue at play is as follows.

  1. ) A group connected to Deep Seek stole the data programmatically. The group in the fall surged in API requests for batches of data that were presumably stored.
  2. ) The group was identified by Microsoft and some third party were able to trace their ties to Deep Seek.
  3. ) – The batch of data mentioned earlier sourced via API access was sufficiently large to then be used as a partial basis for training the R1 model..

Any subsequent interactions would necessarily be a way to crowdsource additional data. There exists techniques for adding new model layers to build over these weights, like LORA, which have been pioneered ironically by Microsoft. Thus, much of the engineering behind DeepSink, while clever, cannot alone account for any bear case towards NVIDIA.