Aramaic Root Atlas: A Cross-Corpus Triliteral Root Explorer

Fresco Benaim, Jose

doi:10.5281/zenodo.19358625

Guide to the Aramaic Root Atlas

The Aramaic Root Atlas is a cross-corpus triliteral root explorer spanning roughly 1,500 years of Aramaic literary history. It brings six corpora — the Peshitta (NT and OT), Biblical Aramaic, Targum Onkelos, Targum Jonathan, and the Hymns of Ephrem of Nisibis — under a single consonantal root index, enabling scholars to trace a single Semitic root across textual traditions, scripts, and centuries, with Hebrew and Arabic cognates visible at every layer.

47,358 verses

685,848 words

5,666 roots

1,604 cognate families

6 corpora

Corpora

The Atlas indexes six corpora of Aramaic text, representing two scripts and over 1,500 years of literary production:

Corpus	Verses	Words	Script	Source	License
Peshitta NT	7,440	101,469	Syriac	ETCBC	—
Peshitta OT	23,072	309,889	Syriac	ETCBC / Leiden	CC-BY-NC
Biblical Aramaic	269	4,880	Hebrew square	Sefaria / WLC	CC-BY-SA
Targum Onkelos	5,846	82,584	Hebrew square	Sefaria	CC-BY-SA
Ephrem of Nisibis (Hymns)	1,435	29,577	Syriac	Digital Syriac Corpus	CC-BY

Limitations & caveats — read before citing

The Aramaic Root Atlas is a research-aid prototype. Final scholarly conclusions should be checked against authoritative sources (HALOT, BDB, Sokoloff, Brockelmann, Lane, Wehr). Specific limitations:

No precision/recall study has been published. Root extraction is heuristic; the confidence indicator (High/Medium/Low) reflects the extraction path, not measured correctness.
Cognates are LLM-generated and have not been systematically validated against authoritative lexicons. The 1,604 entries are suggestions for verification, not authoritative claims.
The triliteral framing fails on non-CCC roots. Geminate, hollow, weak, and quadriliteral roots currently receive Low confidence rather than being represented in their proper morphological class.
Diachronic analysis confounds genre with chronology. The corpora differ simultaneously on genre, register, dialect, and translation source. The chronological ordering is editorial; Targum Onkelos's date is scholarly debated.
Translation tracks introduce bias. WEB (English), Reina-Valera 1909, Van Dyck (1865), SBLGNT (not NA28). "Search by meaning" reflects translator choices, not the underlying Aramaic semantic range.
Researcher annotations and bookmarks are ephemeral. Stored in browser localStorage; lost when the cache is cleared. Export regularly.
Corpus coverage is a thin slice. The Babylonian Talmud, Palestinian Targums, Qumran Aramaic, Mandaic, Imperial Aramaic, and ~95% of Ephrem's surviving works are not indexed.

For the full disclosure of all 12 methodological caveats, see docs/VALIDATION.md. For the validation roadmap, see docs/ROADMAP-v3.1.md.

Methodology

Root Extraction

The engine extracts triliteral consonantal roots from unvocalized text through morphological analysis: systematic prefix/suffix stripping, weighted scoring against a dictionary of verified roots, and resolution of ambiguous candidates by frequency and context.

Cross-Script Normalization

The Aramaic texts employ two writing systems: Syriac (ܐ–ܬ) and Hebrew square script (א–ת). The Atlas normalizes both to a shared Latin key — for example, Syriac ܟܬܒ and Hebrew כתב both resolve to K-T-B — enabling unified root search across corpora.

Morphological Analysis

Separate affix rule sets are applied for Syriac and for Biblical Aramaic in Hebrew script, accounting for differences in clitic prepositions, pronominal markers, and verbal morphology between the two traditions.

Root Confidence Scoring

Each root attribution carries a heuristic confidence indicator displayed as a coloured badge in the reader and word parser. The numeric value is a rule-based heuristic, not a calibrated probability — it reflects which extraction path produced the result, not measured accuracy against a gold standard.

• High — Word matched in the SEDRA Syriac lexicon cache, or a bare triliteral matching a known root. Empirically the most reliable tier, but not yet validated against ETCBC ground truth.
• Medium — Root extracted via affix stripping but not corroborated by the lexicon. Plausible; should be cross-checked for ambiguous prefix/suffix decompositions.
• Low — Root reconstructed via weak-letter expansion or heavy morphological stripping. Most likely to contain errors, particularly for weak roots (I-ʾAlap, II-Waw/Yod, III-ʾAlap), geminate, hollow, and quadriliteral forms — categories where the triliteral pattern itself is a poor fit.

Important caveats:

The score reflects extraction path, not correctness. A High-confidence attribution can still be wrong if the SEDRA lemma itself was misassigned to the surface form, or if homographic ambiguity exists (e.g. consonantally identical roots with different meanings).
Stem (Peal, Pa'el, Aph'el, …) is a separate classification; it is not factored into the confidence number and is itself genuinely ambiguous from unvocalized text in many cases.
No precision/recall study against a hand-annotated gold standard has been published yet. See docs/ROADMAP-v3.1.md Phase 2 for the planned validation work.
The confidence indicator is intended as a prior to inform researcher attention — not as evidence to cite. For formal scholarly work, verify root attributions against authoritative lexicons (Brockelmann, Sokoloff, Costaz, Payne Smith) and against the original manuscript tradition.

Greek Cognates

For New Testament roots, the Atlas provides Greek equivalents from the SBL Greek New Testament, enabling Syriac-Greek comparative analysis.

Cognates & Semantic Structure

Cognates are words in related Semitic languages that share a common root ancestor. For example, Syriac ܫܠܡܐ (shlama), Hebrew שָׁלוֹם (shalom), and Arabic سَلام (salām) all derive from the root SH-L-M, whose semantic core revolves around "peace / wholeness / completion."

The Atlas contains 1,604 cognate families. Each entry includes: root key, glosses in English, Spanish, Hebrew, and Arabic, Hebrew and Arabic cognate words with transliteration and meaning, Greek NT equivalents from the SBL GNT for NT roots (405 links in total), semantic bridges linking cognates whose meanings diverged, outlier flags for cognates with significant semantic drift, sister roots sharing 2 of 3 consonants, and the root flavor (sabor de raíz) — a poetic one-liner capturing the Semitic intuition behind the consonantal skeleton.

The majority of entries were generated with AI assistance (Claude API, Anthropic) from root lists, then manually reviewed and curated for linguistic accuracy.

Tools & Features

Root Search

Enter a root in Latin (SH-L-M), Syriac, Hebrew, or Arabic. Returns all attested forms, glosses, cognates, and verse references across corpora, with live autocomplete.

Example: SH-L-M →

Root Family Visualizer

D3.js force-directed graph showing a root's word family: attested Syriac forms, Hebrew and Arabic cognates, sister roots, semantic bridges, and a paradigmatic key verse.

Example: SH-L-M →

Passage Constellation

Interactive graph of all roots in a passage, showing co-occurrence and semantic clustering across verses.

Example: The Beatitudes →

Parallel Viewer

Side-by-side comparison of the same passage across corpora (Peshitta OT ↔ Targum Onkelos). Reveals interpretive choices between Aramaic traditions.

See: Genesis 1 →

Root Frequency Heat Map

Sortable table of root frequency across all corpora with filter and CSV/JSON export. Reveals distribution patterns: pan-Aramaic roots versus corpus-specific ones.

Explore map →

KWIC Search

Key Word In Context: click a verse reference to see the word highlighted in its immediate textual context, with transliteration and translation.

Verb Stem (Binyan) Analysis

Classifies word forms into seven Aramaic stems: Peal, Ethpeel, Pael, Ethpaal, Aphel, Shafel, Ettaphal. (In Hebrew biblical studies these are often called binyanim; Aramaic has its own stem inventory that overlaps with but is not identical to Hebrew's.) Color-coded stem badges in reader word popovers, stem distribution chart, and paradigm table in the root family visualizer.

Example: SH-L-M →

Hapax Legomena

Surfaces roots and forms attested 1–10 times across the corpus. Filter by corpus, scope (root or form), frequency threshold, and sort criterion. Export to CSV or JSON for external analysis.

Explore →

KWIC Concordance with Export

Full concordance page with left-context | keyword | right-context layout. Group by form or stem. Export to CSV, JSON, plain text, and TEI XML for integration in academic publications.

Example: SH-L-M →

Diachronic Root Analysis

Compares the normalized frequency of a root across corpora in chronological order (Biblical Aramaic → Targum → Ephrem → Peshitta NT → OT). The Shifts view ranks roots with the greatest frequency changes, identifying emerging or declining terms over 1,500 years. In the Shifts table, each colored dot represents a corpus in chronological order — ● Biblical Aramaic, ● Targum, ● Ephrem, ● Peshitta NT, ● Peshitta OT — and dot size is proportional to the normalized frequency in that corpus.

Example: SH-L-M →

Collocations

Computes Pointwise Mutual Information (PMI) between roots co-occurring in the same verse or chapter. Filter by corpus and minimum co-occurrence count to surface statistically significant lexical associations.

Example: SH-L-M →

Semantic Fields

Organizes 1,604 roots into 15 semantic domains (legal/covenant, cultic, war, knowledge, etc.) via AI classification. Each domain lists roots sorted by frequency with corpus badges and links to the visualizer.

Explore domains →

Word Parser

Breaks any Syriac word into prefixes (proclitics + verbal), root, and suffixes — displayed as color-coded morpheme boxes (teal / gold / purple). Shows verb stem badge, confidence score, Hebrew and Arabic cognates, and per-corpus attestation counts. Accepts Syriac script or Latin transliteration (shlm, sh-l-m, ktb).

Example: ܕܐܬܩܕܫܘ →

Passage Lexical Profile

Aggregates lexical statistics for any book and chapter range: unique roots, lexical density, hapax counts, rarity distribution (hapax / rare / common / very common), verb stem distribution, per-verse root density sparkline, and the top 15 most frequent roots with corpus attestation badges. Export as JSON or CSV.

Example: Sermon on the Mount →

Quadrilingual UI

Full interface in English, Spanish, Hebrew, and Arabic with RTL support. Five translation tracks: WEB (EN), Reina-Valera 1909 (ES), WLC (HE), Van Dyck (AR), and SBLGNT (Greek).

Script & Font Options

Transliteration in Latin, Syriac, Hebrew, or Arabic script. Three Syriac font styles: Estrangela (classical), Eastern (Madnḥāyā), Western (Serṭo).

Bookmarks

Save favourite verses and roots with custom tags. Export as CSV, JSON, BibTeX, or Zotero RDF for academic integration. Data stored in your browser (localStorage).

View bookmarks →

Research Notes

Add inline notes to individual verses and roots directly in the reader and visualizer. Manage and filter all notes by tag on the annotations page. Export as JSON, CSV, or Markdown. Data saved in localStorage.

View notes →

Guided Tour

Interactive 12-step walkthrough introducing all main features. Available in all four UI languages. Launch via the help button () in the navbar.

Ways to Explore

The Atlas is built for discovery, not citation. Everything below is a starting point for curiosity — a thread to pull, a pattern to notice — not a validated result. Root extraction and cognates are generated heuristically (see Methodological Notes below); verify anything you would put in a footnote against an authoritative lexicon first.

Follow a root through time — Watch a single three-letter root like SH-L-M travel across 1,500 years and two scripts, from Daniel to Ephrem. The Root Journey shows where it appears, when, and how often.
See languages as cousins — The cognate cards show how Aramaic SH-L-M echoes in Hebrew shalom and Arabic salām. Fascinating connections to explore — an explorer’s leads, not dictionary-grade etymology.
One skeleton, several meanings — Notice how the same three consonants can carry related, sometimes surprising senses: R-W-KH is both “wind” and “spirit,” E-Y-N both “eye” and “spring.” A window into how Semitic languages build meaning.
Read a real ancient text — Open any verse in the reader and crack it open word by word — each word’s root, gloss, and building blocks. A gentle on-ramp to reading Aramaic and Syriac.
Compare traditions side by side — The parallel viewer places the Peshitta and the Targums next to each other on the same verse, so you can see how different communities rendered the same line.
Learn words by family — Build vocabulary through root families rather than isolated words — the way the languages themselves organize meaning.

Methodological Notes

The Peshitta as translation. The mainstream scholarly position holds that the Peshitta New Testament is largely a translation from Greek originals, not an independent Aramaic composition. Root analysis therefore reflects the translator’s lexical choices, not necessarily the original author’s vocabulary. The Peshitta Old Testament was translated primarily from Hebrew, though some portions may preserve independent Aramaic traditions.

Targums as interpretive translations. Targum Onkelos is an interpretive Aramaic rendering of the Hebrew Torah. Its vocabulary reflects the targumist’s paraphrase and exegetical expansion, not a verbatim correspondence with the Hebrew source text.

Root extraction limitations. Root extraction is performed statistically via affix stripping and dictionary matching, without morphological tagging or part-of-speech annotation. No formal error rate has been measured. Weak-letter roots (containing ʾalep, waw, yod) and quadriliteral forms are particularly prone to misidentification. Users should treat root attributions as probabilistic, not definitive.

AI-generated cognates. Of the 1,604 cognate root entries, the bulk were generated with AI assistance (Claude API) and manually reviewed. Researchers conducting formal work should independently verify cognate relationships.

Data Sources & Licenses

Resource	License	Usage
Peshitta OT (ETCBC / Leiden)	CC-BY-NC	Runtime
Biblical Aramaic (WLC / Sefaria)	CC-BY-SA	Runtime
Targum Onkelos (Sefaria)	CC-BY-SA	Runtime
Translations (WEB, RV1909, WLC, Van Dyck)	Public Domain	Runtime
Greek NT — SBLGNT (bible.helloao.org)	CC-BY-SA	7,939 NT verses
Noto Sans Syriac	OFL-1.1	Runtime (CDN)
D3.js	ISC	Runtime (CDN)
bible.helloao.org	—	Pipeline only
Sefaria API	CC-BY-SA	Pipeline only
Digital Syriac Corpus (TEI XML)	CC-BY	Ephrem Nisibis

Technical Notes

All data is loaded at startup from local CSV and JSON files — there are no runtime API dependencies. The application is built with Flask (Python) and D3.js, with vanilla JavaScript for the frontend. Syriac fonts are provided by the Noto Sans Syriac family (OFL-1.1) via Google Fonts.

Source code: github.com/Jossifresben/aramaic-root-atlas

API reference: Interactive Swagger documentation — all REST endpoints with parameters, examples, and try-it-out.

Created by Jossi Fresco.

If you use this software, please cite it as:
Fresco Benaim, Jose. (2026). Aramaic Root Atlas: A Cross-Corpus Triliteral Root Explorer (v3.1.1). Zenodo. https://doi.org/10.5281/zenodo.19358625

See also the Peshitta Root Finder — a focused tool for exploring roots in the Syriac New Testament.

License: Apache 2.0