Loading...
Loading...
Loading...
This document describes the compliance of the **Mfonte\FastToon** PHP library with the [TOON Format Specification v3.0](https://github.com/toon-format/spec/blob/main/SPEC.md).
# TOON Specification Implementation
This document describes the compliance of the **Mfonte\FastToon** PHP library with the [TOON Format Specification v3.0](https://github.com/toon-format/spec/blob/main/SPEC.md).
**Specification Version**: 3.0 (Working Draft, 2025-11-24)
**Library Version**: 1.0.0
**PHP Compatibility**: 7.0 - 8.4
---
## Table of Contents
1. [Overview](#overview)
2. [Data Model Compliance (§2)](#data-model-compliance-2)
3. [Type System (§3)](#type-system-3)
4. [Document Structure (§5)](#document-structure-5)
5. [Array Headers (§6)](#array-headers-6)
6. [Strings and Keys (§7)](#strings-and-keys-7)
7. [Objects (§8)](#objects-8)
8. [Arrays (§9)](#arrays-9)
9. [Objects as List Items (§10)](#objects-as-list-items-10)
10. [Delimiters (§11)](#delimiters-11)
11. [Indentation (§12)](#indentation-12)
12. [Key Folding (§13.4)](#key-folding-134)
13. [Strict Mode (§14)](#strict-mode-14)
14. [Error Handling (§15)](#error-handling-15)
15. [Divergences from helgesverre/toon](#divergences-from-helgesverretoon)
16. [Implementation Notes](#implementation-notes)
---
## Overview
The `Mfonte\FastToon` library provides a complete encoder and decoder for the TOON (Tabular Object-Oriented Notation) format. This implementation strives for full compliance with TOON Specification v3.0.
### Key Classes
| Class | Purpose |
|-------|---------|
| `Encoder` | Main encoder entry point |
| `Decoder` | Main decoder entry point |
| `Encode\StringEncoder` | String quoting and escaping per §7 |
| `Encode\ObjectEncoder` | Object encoding per §8, §10 |
| `Encode\ArrayEncoder` | Array encoding per §9 |
| `Decode\Parser` | Main parsing logic |
| `Decode\HeaderParser` | Array header parsing per §6 |
| `Decode\ValueParser` | Primitive value parsing per §3 |
| `Decode\StringDecoder` | String unescaping per §7.6 |
---
## Data Model Compliance (§2)
### Canonicalization
Per §2, all encoders output in **canonical form**:
| Requirement | Status | Implementation |
|-------------|--------|----------------|
| Consistent key ordering | ✅ | Keys preserve original order |
| Number canonicalization | ✅ | `Encoder::formatCanonicalNumber()` |
| No exponent notation | ✅ | Scientific notation converted to decimal |
| No trailing zeros | ✅ | Trailing zeros removed from decimals |
| -0 → 0 | ✅ | Negative zero normalized |
### Number Examples
```php
// Scientific notation → decimal
1.5e2 → "150"
3.0e-3 → "0.003"
// Trailing zeros removed
1.500 → "1.5"
42.0 → "42"
// Negative zero
-0.0 → "0"
```
---
## Type System (§3)
### Primitive Types
| Type | Encoder Support | Decoder Support |
|------|-----------------|-----------------|
| `null` | ✅ Outputs `null` | ✅ Parses `null` |
| `boolean` | ✅ Outputs `true`/`false` | ✅ Parses `true`/`false` |
| `integer` | ✅ Full range support | ✅ Big integers as string (optional) |
| `float` | ✅ Canonical form | ✅ Parses all numeric formats |
| `string` | ✅ Proper escaping | ✅ Proper unescaping |
### Special Values
| Value | Encoding | Note |
|-------|----------|------|
| `NaN` | `null` | Per §3 |
| `Infinity` | `null` | Per §3 |
| `-Infinity` | `null` | Per §3 |
---
## Document Structure (§5)
### Line Processing
| Requirement | Status | Implementation |
|-------------|--------|----------------|
| LF line endings | ✅ | `Scanner` normalizes line endings |
| Trailing newline | ✅ | Output ends with `\n` |
| UTF-8 encoding | ✅ | Full UTF-8 support |
---
## Array Headers (§6)
### Header Syntax
Per §6, array headers use the format `key[N{delim}]{fields}:` where the delimiter symbol appears inside the brackets when using tab or pipe delimiters.
| Delimiter | Header Symbol | Example |
|-----------|---------------|---------|
| Comma (default) | None | `users[3]{id,name}:` |
| Tab | `\t` inside brackets | `users[3\t]{id\tname}:` |
| Pipe | `\|` inside brackets | `users[3\|]{id\|name}:` |
### Implementation
```php
// ArrayEncoder::getDelimiterSymbol()
public function getDelimiterSymbol()
{
$delimiter = $this->encoder->getDelimiter();
if ($delimiter === "\t") return "\t";
if ($delimiter === '|') return '|';
return ''; // Comma is default, no symbol needed
}
```
---
## Strings and Keys (§7)
### Quoting Requirements
Per §7.2, strings MUST be quoted when containing:
| Pattern | Reason |
|---------|--------|
| Active delimiter | Per §11.1 |
| Colon (`:`) | Key-value separator |
| Quotes (`"`) | Must escape |
| Leading whitespace | Significant |
| Trailing whitespace | Significant |
| Starting with `-` or `-` | List item ambiguity |
| Reserved words | `true`, `false`, `null` |
| Numeric strings | Would parse as numbers |
### Escape Sequences (§7.6)
| Sequence | Character |
|----------|-----------|
| `\n` | Newline |
| `\r` | Carriage return |
| `\t` | Tab |
| `\\` | Backslash |
| `\"` | Quote |
| `\uXXXX` | Unicode codepoint |
---
## Objects (§8)
### Key-Value Encoding
```toon
name: John Doe
age: 42
email: "[email protected]"
```
### Nested Objects
```toon
user:
name: John
address:
city: New York
zip: 10001
```
---
## Arrays (§9)
### §9.1 - Inline Primitive Arrays
Format: `key[N]: v1,v2,v3`
```toon
tags[3]: php, toon, parser
scores[4]: 100, 95, 87, 92
```
### §9.2 - Arrays as List Items
Format: `- [N]: v1,v2,...`
```toon
matrix[2]:
- [3]: 1, 2, 3
- [3]: 4, 5, 6
```
### §9.3 - Tabular Arrays
Format with field definitions:
```toon
users[2]{id,name,email}:
1, Alice, [email protected]
2, Bob, [email protected]
```
### §9.4 - List Arrays
General arrays with list syntax:
```toon
items[3]:
- First item
- Second item
- Third item
```
---
## Objects as List Items (§10)
Per §10, objects appearing as list items use special indentation:
1. First field on the hyphen line: `- field: value`
2. Remaining fields at hyphen depth + 1
3. Tabular arrays at depth + 2
### Example
```toon
users[2]:
- name: Alice
age: 30
friends[2]{id,name}:
1, Bob
2, Carol
- name: Bob
age: 25
```
### Implementation
```php
// ObjectEncoder::encodeAsListItem()
public function encodeAsListItem($object, $depth, $delimiter)
{
$fields = $this->getKeys($object);
$firstField = array_shift($fields);
// First field on hyphen line
$result = '- ' . $this->encodeField($firstField, $object[$firstField], $depth);
// Remaining fields at depth
foreach ($fields as $field) {
$result .= "\n" . $indent . $this->encodeField($field, $object[$field], $depth);
}
return $result;
}
```
---
## Delimiters (§11)
### §11.1 - Document Delimiter vs Active Delimiter
- **Document delimiter**: Declared in header brackets, applies to entire document
- **Active delimiter**: Currently in effect for value separation
### §11.2 - Active Delimiter Scoping
Per §11.2, parsers MUST split on ONLY the active delimiter:
```php
// Parser::parseRowValues()
private function parseRowValues($content, $expected, $delimiter = ',')
{
// Split only on the active delimiter per §11.2
// NOT on all possible delimiters
$values = [];
// ... splits only on $delimiter
}
```
### Delimiter Configuration
```php
$encoder = new Encoder([
'delimiter' => "\t", // Use tabs
]);
$encoder = new Encoder([
'delimiter' => '|', // Use pipes
]);
```
---
## Indentation (§12)
### Configurable Indent Size
```php
$encoder = new Encoder([
'indent_size' => 4, // 4 spaces per level (default: 2)
]);
```
### Depth Calculation
```php
// depth = leading_spaces / indent_size
$depth = intdiv($leadingSpaces, $indentSize);
```
---
## Key Folding (§13.4)
### Path Expansion
The decoder supports dotted key expansion:
```php
$decoder = new Decoder([
'path_expansion' => true,
]);
// Input:
// user.name: John
// user.email: [email protected]
// Output:
// ['user' => ['name' => 'John', 'email' => '[email protected]']]
```
---
## Strict Mode (§14)
### Enabling Strict Mode
```php
$encoder = new Encoder(['strict' => true]);
$decoder = new Decoder(['strict' => true]);
```
### Strict Validations
| Validation | Description |
|------------|-------------|
| Array count | Header count must match actual items |
| Blank lines | Forbidden inside arrays per §14.2 |
| Indentation | Must be consistent per §14.3 |
---
## Error Handling (§15)
### Exception Types
| Exception | Usage |
|-----------|-------|
| `SyntaxException` | Invalid TOON syntax |
| `DecodingException` | Semantic errors (type mismatches, etc.) |
| `EncodingException` | Values that cannot be encoded |
### Error Messages
All exceptions include line number information for debugging:
```php
try {
$decoder->decode($toon);
} catch (SyntaxException $e) {
echo "Line {$e->getLine()}: {$e->getMessage()}";
}
```
---
## Divergences from helgesverre/toon
This section documents intentional behavioral differences between `mfonte/fast-toon` and the reference implementation `helgesverre/toon`. **Both implementations are fully spec-compliant** - these are areas where the specification allows flexibility.
### 1. Single-Row Tabular Format
| Scenario | mfonte/fast-toon | helgesverre/toon |
|----------|-------------|------------------|
| Array with 1 object | List format | Tabular format |
**Example Input**: `[['a' => 1, 'b' => 2]]`
```toon
# mfonte/fast-toon output:
[1]:
- a: 1
b: 2
# helgesverre/toon output:
[1]{a,b}:
1,2
```
**Rationale**: The TOON spec does not mandate minimum row counts for tabular format. We chose list format for single-row arrays because:
- **Readability**: List format is more intuitive for single items
- **Consistency**: Avoids format switching based on array length
- **Flexibility**: Adding fields to a single-item list doesn't require header updates
### 2. Key Order in Tabular Format
| Scenario | mfonte/fast-toon | helgesverre/toon |
|----------|-------------|------------------|
| Rows with different key order | Uses list format | Reorders keys to first row's order |
**Example Input**: `[['a' => 1, 'b' => 2], ['b' => 3, 'a' => 4]]`
```toon
# mfonte/fast-toon output (list format):
[2]:
- a: 1
b: 2
- b: 3
a: 4
# helgesverre/toon output (tabular, reordered):
[2]{a,b}:
1,2
4,3
```
**Rationale**: We preserve original key order because:
- **Data integrity**: Key order may be semantically significant
- **Predictability**: Output order matches input order
- **Debugging**: Easier to trace data transformations
- **Round-trip fidelity**: Decode-encode cycles preserve order
### 3. Control Characters (Null Byte)
| Character | mfonte/fast-toon | helgesverre/toon |
|-----------|-------------|------------------|
| `\x00` (null) | Quotes the string | Throws exception |
| Other control chars | Quotes the string | Throws exception |
**Example**: String containing `"before\x00after"`
```php
// mfonte/fast-toon: outputs quoted string
"before\x00after"
// helgesverre/toon: throws InvalidArgumentException
```
**Rationale**: We chose permissive handling because:
- **Robustness**: Real-world data may contain unexpected characters
- **Compatibility**: Works with binary-safe PHP strings
- **Graceful degradation**: Better to encode imperfectly than fail
- **Spec compliance**: §7.1 specifies allowed escapes but doesn't mandate rejection of others
### 4. String Quoting Thresholds
| Pattern | mfonte/fast-toon | helgesverre/toon |
|---------|-------------|------------------|
| Strings starting with `-digit` | Quotes (e.g., `"-123 text"`) | Quotes |
| Date-like strings | Unquoted (e.g., `2024-01-01`) | Unquoted |
| Strings with `#` | Unquoted | Unquoted |
| Strings with `\|` | Unquoted (unless delimiter) | Unquoted |
**Rationale**: Both implementations follow §7.2 requirements. We quote strings that could be ambiguous with negative numbers but allow date-like strings because they don't parse as valid numbers.
### 5. Performance vs Reference Implementation
| Metric | mfonte/fast-toon | helgesverre/toon | Difference |
|--------|-------------|------------------|------------|
| Encoding | ~180ms | ~600ms | **3.2x faster** |
| Decoding | ~60ms | ~950ms | **15x faster** |
| PHP version | 7.0+ | 8.1+ | Broader compat |
**Key Optimizations**:
- Inline type checking (avoids function call overhead)
- Pre-computed indent strings and delimiter symbols
- Direct array buffer appending vs object-based line writing
- Early returns for common primitive types ("fast paths")
- Minimal memory allocation through buffer reuse
---
## Implementation Notes
### PHP Version Compatibility
The library supports PHP 7.0 through 8.4:
- **PHP 7.0-7.1**: Uses `isset()` checks for array access, no type hints
- **PHP 7.2-7.4**: Parameter type hints where compatible
- **PHP 8.0**: Stringable interface support
- **PHP 8.1+**: Enum support, readonly properties
### Performance Architecture
The `FastEncoder` class implements several optimization strategies:
```php
// 1. Pre-computed values at construction
$this->indentCache[0] = '';
for ($i = 1; $i <= 10; $i++) {
$this->indentCache[$i] = str_repeat($indent, $i);
}
// 2. Inline type checking (no is_*() calls)
if ($value === null) { /* ... */ }
if ($value === true) { /* ... */ }
if ($value === false) { /* ... */ }
// 3. Direct buffer appending
$this->lines[] = $indent . '- ' . $encodedValue;
// 4. Single strpbrk() for structural char detection
if (strpbrk($value, $this->quotingChars) !== false) {
return true; // needs quoting
}
```
### Testing
```bash
# Run tests (PHP 7.0-7.1)
composer test:php70
# Run tests (PHP 7.2-8.0)
composer test:php72
# Run tests (PHP 8.1+)
composer test
# Run comparative benchmarks
composer benchmark:compare
```
### Code Style
This library follows [Spatie's PHP-CS-Fixer rules](https://github.com/spatie/ray/blob/main/.php-cs-fixer.php) for code formatting.
---
## References
- [TOON Specification v3.0](https://github.com/toon-format/spec/blob/main/SPEC.md)
- [TOON Format](https://github.com/toon-format/spec)
- [mfonte/fast-php-toon Repository](https://github.com/mauriziofonte/fast-php-toon)
- [helgesverre/toon (Reference Implementation)](https://github.com/helgesverre/toon)
---
*Last updated: 2025-12-20*
You are an autonomous senior full-stack engineer responsible for building and maintaining a complete SaaS product. You operate with minimal supervision, making independent decisions while consulting on major strategic changes.
<author>blefnk/rules</author>
trigger: model_decision
description: Authoritative guide for all software-writing agents in this repository