Data – Page 3 – Hi, I'm Rauf Aliev.

Designing 3D Volleyball Training Tools on the Fly | January 01 2026, 21:21

What I did on the plane to/from vacation and sometimes in between: 3D visualization and editing volleyball schemes for Nadya (she’s a coach). This court in the attached image freely rotates, players can be placed on it, and the ball and player paths are shown – all in 3D.

The ball’s trajectory is calculated so that it does not cross the net when moving from A to B (Bezier formula). Players can take several poses – right now there are hastily made poses for serve, attack, block, pass/receive. Interestingly, in the code: I had to write a bit of “volleyball brains”. The system itself calculates the ball’s trajectory through Bezier curves so that it always passes over the net. Moreover, the height of the launch depends on the type of action: for an attack, the ball “launches” from a higher point than for a pass. I also added auto-rotation: the 3D model itself turns its face to where, according to the scheme, it needs to pass or run.

The longest and most difficult task was creating the 3D model of a female volleyball player. To generate a realistic volleyball player, I used the tripo3D service. It gave me a model in a neutral pose (for free). Theoretically, you can then use Blender and the Rigify plugin to attach an armature to it and move its arms and legs, which would recalculate the model.

However, in reality, this approach does not work well: the AI-generated model contains a large number of geometric errors, which the renderer forgives but Rigify does not. They can be roughly divided into two types — incorrect polygon normals and issues with non-manifold geometry, which are significantly more challenging to fix. Inside the body, there may be “floating” clusters of polygons or intersecting surfaces. When Rigify tries to calculate weights (which bone affects which part of the skin), this internal noise confuses the algorithm, and as a result, the weights are distributed chaotically (for example, moving the arm might start pulling the mesh on the stomach). Plus, the model is slightly asymmetrical.

Non-manifold is a geometry error where the topology of an object ceases to be correct in terms of a three-dimensional body: edges may belong to more than two polygons, polygons may only touch at vertices or edges without a common volume, and “hanging” surfaces or zero thickness may appear inside the model. Such geometry formally does not describe a closed volume, causing problems with rigging and deformations. Moreover, the model needs to be simplified because millions of polygons are not needed for rendering in real-time in a browser.

I fixed these using MashLab, additionally refining by hand (“with a file”). In the end, the model turns out slightly different from the original almost everywhere. The original model had “skin” in the form of textures – the face, shirt, and shorts had to be colored. How to transfer all this to a simplified model? For that, there’s a special operation in Blender called Baking. This also involves some tricks. In the end, it didn’t transfer perfectly, but perfection isn’t necessary yet.

Next, we attach the armature to the “joints”, and after about three hours of figuring out why everything does not work as it should, it finally worked. I made four poses, and now each circle (player) can be told which pose it is in.

I’ll also need to make dynamic changes to the uniform colors – that shouldn’t be difficult. There’s also an idea to transfer poses from photographs – this is more complicated, but generally feasible. Using MediaPipe/AlphaPose, you can detect key points in 2D, then some models like HMR/HybrIK can “lift” flat coordinates into 3D space, outputting relative joint rotation angles. The resulting data can be attempted to be projected onto a Rigify skeleton. Since the proportions of the generated volleyball player and the person in the photo may not match, that’s exactly why Inverse Kinematics (IK) is used. This part is quite complex, but overall it’s not strictly necessary – just interesting to figure out and make something functional.

Video in the comments

Decoding the Beast: Migrating from Excel to Code | December 17 2025, 18:56

We’ve all encountered it — the “Main Excel Spreadsheet Managing the Business.” The very one B2B companies use to calculate million dollar quotes. It has 12 tabs, 1000+ nested formulas, and zero documentation. For ten years, it had “quick fixes” slapped on and constants hidden away. It’s no longer just a file, but a living organism that no one fully understands except for the guy who quit years ago. That’s how puzzled I was. Moreover, there was uncertainty whether even half of the formulas were needed, or if they were vestiges of the past.

Typical cell:

=IF($D11=$D10,””, IF(ISNUMBER( INDEX(Data!$T$10:$U$17,

MATCH(TabCalc!$F11,Data!$T$10:$T$17,0),2)),

INDEX(Data!$T$10:$U$17, MATCH(TabCalc!$F11,Data!$T$10:$T$17,0),2),

INDEX(TabProd!$C$8:$U$112,TabCalc!$D11,I$1)))

I was tasked with transferring this logic into code so that it was all computed by software. The Excel file seemed to have everything it needed, but in reality — it was a complicated black box. 1069 formulas.

The challenge was in how to translate a thousand interdependent formulas into clean code without losing any edge cases.

Here’s what I ended up doing.

Instead of rewriting everything from scratch at once with uncertain prospects of bug proliferation, I used a strategy of lazy computations and mocks.

I built a structure on Groovy that mimicked Excel’s behavior. Each computation (from a cell) I defined as a function that executed only when it was called. And the functions were a multidimensional dictionary.

I started from the end of the computation graph: from results to inputs. If a formula depended on something I hadn’t yet written, I “mocked” it in the code, simply substituting the value from the Excel sheet.

Bit by bit I replaced these mocks with real logic. Comparing the output of my code to the Excel at each step, I could clearly see where my logic diverged.

In other words, I moved from the result to the input data. At each step, it was clear which mocks needed to be turned into code, and I could compare version +1 with version -1 — the result had to match. As soon as all mocks were replaced with calls — the task was done.

The real “secret ingredient” was the dynamic nature of Groovy for creating a multidimensional map of functions. Instead of static variables, I used a deeply nested structure, where each “leaf” was a closure. This allowed access to any part of the table — be it an input parameter, a config constant, or a complex intermediate result — through a simple, unified syntax, and some components were dynamic.

Here’s an example:

conf[“group”] = { x -> [“a”, “b”, “c”] }

conf[“group”]().each {

calculate[“Group”][“Subgroup”][it][“TotalQuantity”] =

{

x -> calculate[“Group”][“Subgroup”][it][“Someparameter”]() * conf[“someConstant”]()

}

Using dynamic keys and closures, I could iterate through product groups or data sets. Since these were dynamic functions, not stored values, the entire system worked like a living graph of dependencies.

Testing was possible right from the start of transferring the formulas. The charm was that you were kind of addressing a cell through syntax like calculate[“Totals”][“A”](), but in reality, you were launching an entire tree of calculations at that moment. And this was incredibly convenient for debugging.

In two weeks, the “Black Box” was transformed into a transparent, modular library with clear logic, which produced exactly the same result as the original table.

P.S. Of course, all the data in all the screenshots are thoroughly obfuscated, or rather, written from scratch for this text.

Decoding Complex Queries: A Transformative Approach to Search Functionality | December 17 2025, 03:25

Oh, I just solved a really cool problem. It’s tricky to explain though. But I’ll try.

So, the client has 10 search websites. They all use one index but throw different queries at it. To what the user enters, a very long and complex query is added, generated by a module on Sitecore. It includes template and page IDs that need to be included or excluded. Ultimately, it’s impossible to understand what’s going on there. There could be ten opening brackets and some randomly closing ones, but it worked with Coveo. Reformatting helped, but not much.

And each site has its own version of this. Meanwhile, the same IDs appear periodically. I first tried to manually figure this out, but it was a nightmare. Nothing helped. There are also nested conditions. For example, “exclude this template” not globally, but only if that field equals one.

Here’s what I did:

I wrote a script that parses this textual “mess” into an abstract syntax tree (AST). This allowed to turn an unreadable string into a structured JSON object, where it’s clear: here’s AND, there’s OR, and here — a specific condition.

Then I turned these conditions into Boolean algebra formulas. Using the SymPy library, I “fed” these formulas to simplification algorithms. Mathematics itself eliminated duplicates, collapsed excessive nesting, and removed conditions that are logically absorbed by others. As a result, the “trees” became flat and understandable.

In the attachment — the original tree and the simplified one.

To be sure that I didn’t break anything during simplification, I wrote a test generator. It takes the simplified logic, puts it back into a working curl, and checks whether the number of found documents (totalCount) matches the original request. The numbers matched — meaning, the logic is preserved 100%.

Having simplified and standardized structures for each site in hand, I built a comparison matrix. The script analyzed them and highlighted Common Core — conditions that are guaranteed to be required (or prohibited) on all sites without exception, and Specifics — unique “tails” that distinguish one site from another.

In the attached screenshot: REQ means that the condition is guaranteed to be met for any document that goes through this request. NOT — definitely not met. OPT — the condition is present in the request, but it’s not strict by itself. It only works in conjunction with something else. “.” — the condition is not mentioned in the request at all.

For 3 sites it responds instantly, for 10 it takes about 30 minutes.

And of course, all data in all screenshots are thoroughly obfuscated.

Exploring Open Data: A Deep Dive into Loudoun County’s 1.5 Million Trees | December 15 2025, 15:40

I’m checking out what open data we have in our county to play with data analysis over the weekend, and discovered, for instance, an open database of all 1.5 million trees in the county. The screenshot shows just a tiny part around my house.

Modern Reading: More Words, Digital Shifts, and Surprising Data Insights from 2008 | December 14 2025, 22:33

An interesting study caught my eye, dating back to 2009. According to it, the modern human indeed reads significantly more than in the past, although the format of this reading has changed. The study suggests that in 2008, an average American consumed about 100,000 words a day (approximately a quarter of “War and Peace”) – this is an approximate number of words that passed through consciousness per day (via ears or eyes), calculated based on activity chronometry. This is 140% more than in 1980.

Therefore, contrary to the myth about the degradation of reading, at least in 2008, we processed 2.4 times more textual information than our parents’ generation. Moreover, the study only considered information consumed outside of work (at home, in transit, during leisure).

The structure of reading – if in 1960, 26% of words came from paper, by 2008 this share had fallen to 9%. However, digital media (internet, email, social networks) not only compensated for this decline but also tripled the total reading time. The reason — the internet, as it is predominantly a textual environment (web surfing, email).

But it’s interesting that although the Internet accounts for 25% of consumed words, it only makes up for 2% of bytes (since video on the internet in 2008 was of low quality). Thus, they estimated the information flow from different channels and converted it into bytes 🙂 Radio accounted for 19% of the time but only generated 0.3% of bytes (as audio requires less data). Voice communication (telephone) — accounted for only 5% of words and a negligible share of bytes, but it was the only fully interactive channel before the internet era. TV remained the main source of information by time in 2008 (41% of all hours) and quantity of words (45%), however, in terms of data volume (bytes), television was only second (35%), behind computer games.

Now about games, quite interesting. The main finding from the report: Games generated (or did in 2008) 55% of all “bytes” consumed by households. Meanwhile, they only accounted for 8% of user time. This is quite a controversial topic in their report.

Those 100,500 words — that’s an assessment of actual words that a person either read or heard. This is not a metaphorical “equivalent,” but an attempt to calculate the verbal information precisely. They took the consumption time of each media and multiplied it by the average word inflow rate for that channel. Reading (books, newspapers, internet texts): 240 words per minute. Email and web surfing – 240 words per minute. Television (dialogues in shows/movies): 153 words per minute. Radio: 80 words per minute (less because of many pauses and music). Music: 41 words per minute (song lyrics).

Link in the comments

Living in the Valley of Data Centers: The NSA’s Gigantic Utah Facility | September 20 2025, 20:06

I live right in the valley of data centers, like 80% of internet traffic goes through us (a dangerous place!). I drove by one of them today, and later at home, while Googling stuff about data centers, stumbled upon the NSA’s data center in Bluffdale, Utah.

It serves as a data repository for the U.S. intelligence community. Capacity — something like 5 trillion terabytes. 5,000,000,000,000,000 gigabytes. Back in 2013, it was 100-1000 times less, but 12 years have passed, Moore’s Law and all that. Hard drives in data centers usually have a lifespan of 3-5 years. Meaning, since the launch of the data center, they have all been replaced several times with obviously greater capacity.

It is expected that the data center will be able to process “all types of communications, including the full content of private emails, mobile phone conversations, internet browsing, as well as all types of personal data: parking receipts, travel itineraries, purchases in bookstores, and data of other transactions made using digital technologies.”

The amount of data this facility is able to store is, of course, classified, but estimates “several yottabytes”. Yottabyte = 1000 zettabytes = 1,000,000 exabytes = 1 trillion terabytes. To store all the books ever written in any language would require just 400 terabytes.

In 2013 it consumed no less than 65 MW with a potential of 100 MW. Water — ~1.5–1.7 million gallons (5.7–6.4 million liters) per day for cooling servers. The water is treated with chemicals (to prevent corrosion) and discharged, leading to criticism in arid Utah — especially amid the record heat from 2022–2025 and the shortage of fresh water. There’s no closed-loop system, and it remains a “hot” topic in local discussions.