About this site

Reading Sībawayhi is an independent publication launched in September 2025. The goal is to help people read and understand the Kitab by providing:

A free, digital, programmable version of the text of the Kitāb
- UTF8 encoding, XML markup
- XML-based programming tools
- Hypertext webpages
Translation, notes, and commentary
A detailed lexicon of key terms
Reader's Guides to selected topics

If you subscribe today, you'll get full access to the website as well as email newsletters about new content when it's available. Your subscription makes this site possible, and helps me keep the lights on so I can keep working to achieve its goals (see Welcome for more information). Thank you!

Access all areas

By signing up, you'll get access to the full archive of everything that's been published before and everything that's still to come. Your very own private library.

Fresh content, delivered

Stay up to date with new content sent straight to your inbox! No more worrying about whether you missed something because of a pesky algorithm or news feed.

Meet people like you

Join a community of other subscribers who share the same interests.

About the text

The Kitāb is composed of 572 articles. Each article is published as a separate page. The Text link will bring you to a Table of Contents.

Each Kitāb article on this site is generated from XML source files, which are available at https://github.com/sibawayhi/kitab. Source files for the English translations are not publicly available.

WARNING: this is not a critical edition. The goal here is to make the Kitāb accessible and readable, not to offer a definitive version.

Modern printed editions usually include punctuation marks, quotation marks, etc., but these are all editorial emendations - in Sībawayhi's day the Arabic writing system had no such marks. So our source files excluded them.

On the other hand, Arabic does include quotation devices. In particular the verb قال (qāla, “he said”) functions as a quotation device (Sībawayhi explicitly describes this), and sometimes the “definite article” (the alif-lām, ال) functions as a quotation device. I've marked up the text to make such quotations explicit. For example, فَهُوَ قَوْلُكَ عَلَيْكَ زَيْدًا is marked up as:

فَهُوَ قَوْلُكَ ‮<قول هوية='٤٠٤٢٣'>عَلَيْكَ زَيْدًا</قول‏>‬

This allows us to highlight such dicta typographically in the generated webpages.

I make a distinction between dicta (أقوال “sayings”), which are passages like the above, where the quoted passage is sentence-like; and hurūf (حروف “terms”), which are quoted words or other sub-sentential phrases. For example, كَمَا حَمَلْتَهُ عَلَى لَكَ حِيْنَ ذَكَرْتَهَا بَعْدَ هَلُمَّ is marked up as:

كَمَا حَمَلْتَهُ عَلَى ‮<حرف>لَكَ</حرف> حِيْنَ ذَكَرْتَهَا بَعْدَ <حرف>هَلُمَّ</حرف>‬

Dicta are typeset in blue; hurūf in red.

Tashkīl

The goal is to publish each article with full tashkīl (diacritical marks, i.e. the vowel marks, shadda, etc.) to help readers with limited Arabic. This requires manual editing of every word of every article. Fortunately I found a website that partially automates this using some form of AI. It's surprisingly good but not perfect. So every article has been processed with that tool, but only about 300-350 articles have been manually edited. The status is indicated by the color of the article title: blue means it has been manually edited at least once (which means it may have errors; three edits should mean no errors), and black means it has not been manually edited (which means it will certainly have errors.) At the bottom of each article page, a Colophon indicates the editing status: "0/100%" means it has been run through the automatic tashkil engine but not manually edited; "1/100%" means it has been 100% manually edited once; "2/50%" would mean that the second manual edit is 50% complete.

Segmentation

In Sībawayhi's day the writing system had no full-stop punctuation mark demarcating end-of-sentence. In fact Sībawayhi has no concept corresponding to our “sentence”. Printed editions of the Kitāb usually insert such marks. The text on this site does not do this, but it does segment the text in order to indicate parsing structure. In the XML sources, each segment is marked up with a مقطع tag, like this

‮<مقطع هوية='٩٧٢٨٠'>...</مقطع>‬

Segmentation is meta-text, not text. The original text without segmentation or punctuation can easily be extracted from the XML sources. The segmentation reflects my editorial judgment. It is not based on an explicit principle; it's really an aesthetic judgment.

Each segment is typeset on a separate line; I find this often makes the structure and sense of the text more easily graspable. (Note that the code that generates the webpage could easily be modified to ignore the segmentation.) You may or may not agree with my segmentation, but the important point is that the goal is to make the text understandable even for those with limited knowledge of Arabic, not to offer an authoritative reading of the text.

All of the articles in the first part of the book have been segmented. Many of the articles in the second part have been segmented but a substantial number have not. The segmentation status is indicated by the article number included in the title: no underline means no segmentation; underlined means segmentation is completed, at least to a first version. Like tashkil, segmentation is versioned, and may change.