Exploring the Technological Marvels of Tesla’s Full Self-Driving Capabilities | July 11 2025, 03:59

I read various engineering blogs about Tesla’s autopilot (FSD) — simply because for the last month and a half I’ve been almost constantly riding as if in a taxi — you set the destination and hardly ever need to intervene, the car travels from point A to point B completely independently. This is certainly the future.

Such systems exist not only at Tesla. For example, Mercedes has one (Drive Pilot). Others only help in traffic jams at best. Though Tesla seems to be the only one that works on all roads.

So, returning to engineering curiosities. Tesla has an AI model production on its “farm” called Dojo — an exaFLOP supercomputer on Tesla chips. Videos from cameras are fed into it, and it trains models that are then sent out for autonomous operation across the entire fleet of Tesla cars.

The FSD architecture comprises about 48 specialized neural networks, trained on Dojo, which together form about 1,000 different prediction tensors. Tesla is gradually moving from modular networks (object recognition + planning) to end-to-end training — directly converting video frames into steering trajectory/action. This is akin to a “black box” — the neural network learns directly from human behavior, without manual tuning of knobs; an extremely cool engineering solution, but, I suspect, complex to debug.

By the way, it is claimed that Tesla has switched from C++ to Python. And that this shift to end-to-end training has made 300,000 lines of C++ code unnecessary, where various corner cases and rules for resolving different scenarios were accounted for — now it’s at the model level.

Tesla has abandoned radar and ultrasonics, switching to purely camera solutions (Vision Only) with “Hardware 4” (HW4, FSD Computer 2): 16 GB RAM, 256 GB flash memory, performance 3–8× higher than HW3.

Assess the performance: 22 milliseconds to create a 3D scene with cars, pedestrians, cyclists around — information is collected from 8 cameras 36 times per second.

85 ms for the entire cycle from receiving the image to changing the plan and commands to the wheels. Fantastic!

More than 4 million Teslas on the roads collect data daily, and in the FSD Beta version, more than a billion miles of autonomous driving have been recorded. This “live” dataset is used to train networks on the most real-world scenarios, including rare “edge-case” incidents (strange accidents, road conditions, etc.).

In June 2025, Tesla for the first time delivered a Model Y from the factory in Austin to a customer’s home without a driver or remote operator — fully autonomously. This is very cool.

The Vision network not only analyzes the current frame but also stores features from previous ones (at a distance of ≈1 m). This allows it to remember recently crossed markings/signs, even if they have already left the field of view – very similar to human memory.

Advancing Full-Text Search: Testing and Refining with Multi-User Platforms | July 06 2025, 04:35

I have developed expertise in full-text search testing. Essentially, it’s a turnkey multi-user platform that, given roughly 1000 queries and several search engine configurations, can produce reports with graphs, metrics, and conclusions by morning, showing why configuration A performs better than B, and here’s why. It calculates all those NDCG@k, MAP, precision, recall, and about a dozen other metrics. It uses LLM, but only at the final stage, after all the math is done.

So, here’s my question. I’m looking for someone who has faced the same issue in their project, to understand the demand and the ask.

The problem the system solves is defined as follows: there is a functional search for goods, documents – Solr, Coveo, Elasticsearch, Algolia – it doesn’t matter, and there are hypotheses on how to improve it, but there is also the fear that improving one aspect might break another. Well, my thing helps to see this in numbers and graphs, providing a conclusion with justification, including statistical significance and other metrics.

It also acts as a virtual search assessor. For each search result, it can give a rating, assessing how well each document matches the query. This is a very non-trivial task (especially for large documents), involving chunking, embeddings, LLM evaluation of relevant chunks, etc. Non-trivial, but it works.

It also can analyze search queries and break them into groups based on similarity. For instance, such segmentation might show that users sometimes separate the words forming a brand name with a space, and sometimes not. These different variants would be grouped together.

I would like to discuss this with someone who knows more about this topic than I do, someone who has/had such problems and has somehow solved them.

I currently feel like my product is unique in the market. Actually, it’s not even on the market yet. But I really don’t see anything similar out there. Maybe nobody needs it?

I won’t publically post screenshots yet. The picture is merely for attracting attention.

Please share if there might be relevant people in your network.