I read various engineering blogs about Tesla’s autopilot (FSD) — simply because for the last month and a half I’ve been almost constantly riding as if in a taxi — you set the destination and hardly ever need to intervene, the car travels from point A to point B completely independently. This is certainly the future.
Such systems exist not only at Tesla. For example, Mercedes has one (Drive Pilot). Others only help in traffic jams at best. Though Tesla seems to be the only one that works on all roads.
So, returning to engineering curiosities. Tesla has an AI model production on its “farm” called Dojo — an exaFLOP supercomputer on Tesla chips. Videos from cameras are fed into it, and it trains models that are then sent out for autonomous operation across the entire fleet of Tesla cars.
The FSD architecture comprises about 48 specialized neural networks, trained on Dojo, which together form about 1,000 different prediction tensors. Tesla is gradually moving from modular networks (object recognition + planning) to end-to-end training — directly converting video frames into steering trajectory/action. This is akin to a “black box” — the neural network learns directly from human behavior, without manual tuning of knobs; an extremely cool engineering solution, but, I suspect, complex to debug.
By the way, it is claimed that Tesla has switched from C++ to Python. And that this shift to end-to-end training has made 300,000 lines of C++ code unnecessary, where various corner cases and rules for resolving different scenarios were accounted for — now it’s at the model level.
Tesla has abandoned radar and ultrasonics, switching to purely camera solutions (Vision Only) with “Hardware 4” (HW4, FSD Computer 2): 16 GB RAM, 256 GB flash memory, performance 3–8× higher than HW3.
Assess the performance: 22 milliseconds to create a 3D scene with cars, pedestrians, cyclists around — information is collected from 8 cameras 36 times per second.
85 ms for the entire cycle from receiving the image to changing the plan and commands to the wheels. Fantastic!
More than 4 million Teslas on the roads collect data daily, and in the FSD Beta version, more than a billion miles of autonomous driving have been recorded. This “live” dataset is used to train networks on the most real-world scenarios, including rare “edge-case” incidents (strange accidents, road conditions, etc.).
In June 2025, Tesla for the first time delivered a Model Y from the factory in Austin to a customer’s home without a driver or remote operator — fully autonomously. This is very cool.
The Vision network not only analyzes the current frame but also stores features from previous ones (at a distance of ≈1 m). This allows it to remember recently crossed markings/signs, even if they have already left the field of view – very similar to human memory.
