Navigating Complexity: The Challenge of Wikipedia’s Expert-Driven Content | November 26 2025, 01:06

Wikipedia has one big problem. Well, or we have it with Wikipedia. If you go to almost any Wikipedia page about a relatively complex mathematical or physical concept, you often suddenly don’t want to read it any further. Formally everything is correct there, but the explanation is given through concepts, often even more complex than the concept being explained. Besides, there is often a lot of unnecessary information — what is formally/academically/taxonomically part of the topic, but essentially “pollutes” the first impression.

This problem arises because the authors of Wikipedia (often mathematicians) prioritize rigor and completeness rather than didactics and comprehensibility.

In the English-speaking environment, this is sometimes called “Drift into pedantry”. Articles are often written by experts for experts, not for those who are trying to learn the subject from scratch.

Let’s take, for example, a “tensor”. Imagine a student who has heard that tensors are used in machine learning (Google TensorFlow) or physics and wants to understand the essence.

What the reader expects (intuition): “A tensor is a table of numbers (or some sort of data container) that describes the properties of an object and correctly changes if we rotate the coordinate system”

What Wikipedia provides: “A tensor (from Latin tensus, ‘strained,’ as per the classical layout of mechanical stress at the sides of a deformable cube, see illustration) — is a layout (arrangement in space) of numbers (components), used in mathematics and physics as a special type of multi-index object, possessing mathematical properties.” The article immediately starts listing ranks, covariance and contravariance of indices. This is formally correct but it “pollutes” the first impression.

The illustration at the very top is captioned like this: “Mechanical stress, deforming a cube with faces perpendicular to the coordinate axes, in classic elasticity theory is described by the Cauchy stress tensor, which links 2 indices: the normal vector to the face with the stress vector T (force per unit area); there are 3 directions of normals and 3 directions of stress components, which gives a 2nd rank tensor 3×3 — consisting of 9 components.”

Formally — not a single error. In fact — it’s a wall of text that requires knowledge of linear algebra just to read the definition.

It’s as if you asked “What is an apple?”, and you were responded with: “An apple is a fruit of plants from the subfamily Amygdaloideae or Spiraeoideae, featuring an epicarp, mesocarp, and endocarp, often participating in Newton’s gravitational experiments.”

On one hand, it seems like with the emergence of LLM, Wikipedia is no longer necessary. There are conditional LLMs like ChatGPT, which essentially paraphrase everything that is in Wikipedia in the required form. But they do it because they were trained on Wikipedia, and undoubtedly Wikipedia was given much more weight during training than other internet junk. If there was no Wikipedia in the training set, it would be much more difficult. Meanwhile, Wikipedia is constantly edited, and LLM and Google use it exactly when answering questions.

Therefore, on the one hand, it seems to me that it is high time for Wikipedia to transition to generating on the basis of expert-curated data and packaging knowledge in the required format, for example, in the form of questions and answers. On the other, the whole idea of encyclopedia master-data for LLM/RAG is lost.

The paradox is that LLM is, in essence, the only “interface” that was able to read these pedantic definitions of Wikipedia, “understand” them (through thousands of examples of code and articles) and translate them back into humane language. Wikipedia has become an excellent database for robots, but a poor textbook for people.

Share this:

Related

Leave a comment Cancel reply