The Geometry of Meaning: A Word Embedding Exploration Tool
Semantica is a navigational tool for the hidden "map" of human language. It treats language not as a sequence of letters, but as a high-dimensional geometric space where word meaning is defined by physical coordinates.
Humans understand words through context and emotion. Computers, however, only understand numbers. To bridge this gap, we use Word Embeddings.
Imagine every word in the English language is a floating point in a dark, infinite void.
- Words with similar meanings (like Ocean and Sea) are clustered tightly together.
- Opposites or unrelated words (like Apple and Justice) are miles apart.
In a 3D software like Houdini or Maya, a point is defined by 3 coordinates:
This project utilizes GloVe (Global Vectors for Word Representation). These are "static" embeddings, meaning the computer has pre-scanned billions of lines of text (from Wikipedia and the web) to calculate these coordinates based on co-occurrence. If two words frequently appear near each other in the real world, they are moved closer together in the vector space.
The true magic of Semantica is that because words are numbers, you can perform math on them to uncover cultural relationships and navigate linguistic clusters.
By adding and subtracting vectors, you can "transport" meanings across the map.
By subtracting the "man" vector from "king," we mathematically strip away the concept of masculinity while retaining "royalty." Adding "woman" applies femininity to that royal essence, landing us at the coordinates for "queen."
The | operator allows you to perform Spatial Filtering. If you provide a list of words, Semantica will identify which word mathematically "doesn't belong."
Example: apple | banana | meat | orange
How it works:
- The Centroid: Semantica calculates the "Center of Gravity" (the average coordinate) for all words in your list.
- The Distance Scan: It then measures the distance from each word to that center.
- The Outlier: Words like apple, banana, and orange exist in a tight "Fruit Cluster." Meat, however, is located in a distant part of the 300D space. Semantica flags the word with the highest distance from the group as the outlier.
Semantica is built as a lightweight, high-performance engine for spatial language analysis.
- Language: Python 3.x
- Math Engine: NumPy. We use vectorized matrix operations to calculate Euclidean distance and Cosine Similarity across 400,000+ words in milliseconds.
- Data Source: GloVe 6B (300-dimensional vectors).
- CLI / Web Interface: A minimalist interface designed for rapid expression evaluation.
- Parser: A custom regex-based expression evaluator that translates human-readable strings (like
paris - france + italy) into NumPy-executable operations.
- Python 3.10+
- NumPy
The core logic uses GloVe embeddings.
- Download
glove.6B.zip. - Extract the contents.
- Place
glove.6B.300d.txtinto the/databasefolder.
- Install Python,
- Download/clone repo,
- Download embedings file (database setup),
- Fix and run semantica.bat (change path to Python).

