> For the complete documentation index, see [llms.txt](https://docs.workbird.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.workbird.io/custom-integrations/field-generation-and-schema-validation.md).

# Field Generation & Schema Validation

### Overview

This documentation covers our automatic schema generation system that converts JSON samples into validation schemas, and the subsequent field validation process. The system is designed to simplify API design by allowing developers to define data structures using annotated JSON samples rather than writing complex schema definitions manually.

<figure><img src="/files/8pfWYa9EzdvAdp3OunDJ" alt=""><figcaption></figcaption></figure>

### Table of Contents

1. [Schema Generation Process](#schema-generation-process)
2. [Field Annotation System](#field-annotation-system)
3. [Generated Schema Structure](#generated-schema-structure)
4. [Field Validation Engine](#field-validation-engine)
5. [Error Handling & Messages](#error-handling-and-messages)
6. [Best Practices](#best-practices)
7. [Examples & Use Cases](#examples-and-use-cases)
8. [Troubleshooting](#troubleshooting)

### Schema Generation Process

#### How It Works

The schema generation process transforms annotated JSON samples into structured validation schemas through the following steps:

1. **JSON Parsing**: The input sample is parsed to extract the basic structure and data types
2. **Annotation Processing**: Field names are analyzed for special markers (`?`, `*`, combinations)
3. **Type Inference**: Data types are automatically detected from sample values
4. **Schema Construction**: A comprehensive schema object is built with validation rules
5. **Metadata Addition**: Additional properties like `__field_required` are added for validation logic

#### Core Algorithm

The system uses recursive parsing to handle nested objects and arrays:

```javascript
const generateSchema = (obj) => {
    if (Array.isArray(obj)) {
        return { type: 'array', items: generateSchema(obj[0] || {}) };
    }
    
    if (obj === null) {
        return { type: 'null' };
    }
    
    if (typeof obj === 'object') {
        // Process object properties with annotation handling
        return processObjectSchema(obj);
    }
    
    return { type: typeof obj };
};
```

#### Type Detection

The system automatically detects the following JavaScript types:

* **Primitive Types**: `string`, `number`, `boolean`
* **Complex Types**: `object`, `array`
* **Special Values**: `null`, `undefined`

Arrays are handled by analyzing the first element to determine the schema for all items in the array. Validation will fail if an array contains multiple types

### Field Annotation System

#### Annotation Markers

Our system uses special suffix markers on field names to define validation behavior:

| Marker       | Meaning             | Required | Nullable | Example                 |
| ------------ | ------------------- | -------- | -------- | ----------------------- |
| (none)       | Standard field      | ✅ Yes    | ❌ No     | `"name": "John"`        |
| `?`          | Nullable field      | ✅ Yes    | ✅ Yes    | `"nickname?": "Johnny"` |
| `*`          | Optional field      | ❌ No     | ❌ No     | `"metadata*": {...}`    |
| `?*` or `*?` | Optional + Nullable | ❌ No     | ✅ Yes    | `"notes?*": null`       |

#### Annotation Processing Logic

The system processes annotations in the following priority order:

1. **Combined annotations** (`?*` or `*?`) are checked first
2. **Single annotations** (`?` or `*`) are processed next
3. **No annotations** default to required, non-nullable fields

#### Field Name Cleaning

After processing annotations, the system:

1. Removes suffix markers from field names
2. Stores the clean field name in the schema
3. Preserves the original validation requirements

### Generated Schema Structure

#### Schema Properties

Each generated schema contains the following structure:

```json
{
  "type": "object",
  "properties": {
    "fieldName": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": false
    }
  }
}
```

#### Property Schema Fields

Each field schema includes:

* **`type`**: The expected data type(s) - can be a string or array of strings
* **`nullable`**: Boolean indicating if null values are allowed
* **`__field_required`**: Custom property indicating if the field must be present
* **`items`**: For arrays, contains the schema for array elements

#### Type Representations

* **Single Type**: `"type": "string"`
* **Union Types**: `"type": ["string", "null"]`
* **Arrays**: `"type": "array"` with `"items": {...}`
* **Objects**: `"type": "object"` with `"properties": {...}`

### Field Validation Engine

#### Validation Process

The validation engine processes data through multiple phases:

**Phase 1: Structural Validation**

* Checks for undefined data or schema
* Validates basic data structure integrity

**Phase 2: Type Validation**

* Compares actual data types against schema expectations
* Handles union types (multiple allowed types)
* Validates array and object structures

**Phase 3: Field Presence Validation**

* Checks required fields are present
* Allows optional fields to be missing
* Validates field name casing

**Phase 4: Recursive Validation**

* Validates nested objects and arrays
* Maintains path information for error reporting
* Handles deep object structures

#### Validation Rules

**Required Field Rules**

```javascript
// Field is required if __field_required === true
if (fieldSchema.__field_required === true) {
    // Field must be present in data
    // Can be null only if schema allows it
}
```

**Optional Field Rules**

```javascript
// Field is optional if __field_required === false
if (fieldSchema.__field_required === false) {
    // Field can be completely missing from data
    // If present, must match schema type requirements
}
```

**Null Value Handling**

```javascript
// Null is allowed if any of these conditions are true:
- schema.nullable === true
- schema.type === 'null'
- Array.isArray(schema.type) && schema.type.includes('null')
```

#### Path Tracking

The validator maintains a path string for precise error location:

* Root level: `""` or `"root"`
* Object properties: `"user.name"`
* Array elements: `"items[0]"`
* Nested structures: `"user.addresses[0].street"`

### Error Handling & Messages

#### Error Types

The system provides detailed error messages for various scenarios:

**Type Mismatch Errors**

```
"Type mismatch at path 'user.age': expected number but got string"
```

**Missing Required Field Errors**

```
"Missing required property 'email' at path 'user'"
```

**Unexpected Property Errors**

```
"Unexpected property 'extra_field' at path 'user'"
```

**Case Sensitivity Errors**

```
"Key case mismatch at path 'user': expected 'firstName' but found 'firstname'"
```

**Null Value Errors**

```
"Null value not allowed at path 'user.name'"
```

#### Error Object Structure

```javascript
{
    valid: boolean,     // Always false for errors
    error: string       // Descriptive error message with path
}
```

#### Success Response Structure

```javascript
{
    valid: true        // No error property for successful validation
}
```

### Best Practices

#### Schema Design

1. **Use Clear Field Names**: Choose descriptive names before adding annotations

   ```json
   // Good
   "user_email?": "john@example.com"

   // Avoid
   "e?": "john@example.com"
   ```
2. **Consistent Annotation Usage**: Apply the same patterns across your API

   ```json
   {
     "required_field": "value",
     "optional_field*": "value",
     "nullable_field?": null,
     "flexible_field?*": "value or null or missing"
   }
   ```
3. **Meaningful Sample Data**: Use realistic sample values

   ```json
   // Good - shows expected format
   "created_at": "2024-01-15T10:30:00Z"

   // Avoid - unclear format
   "created_at": "some date"
   ```

#### Validation Integration

1. **Early Validation**: Validate data as early as possible in your pipeline
2. **Error Propagation**: Pass validation errors with full path information
3. **Graceful Degradation**: Handle validation failures appropriately

#### Performance Considerations

1. **Schema Caching**: Cache generated schemas to avoid repeated parsing
2. **Validation Batching**: Group related validations together
3. **Path Optimization**: Use efficient string building for deep paths

### Examples & Use Cases

#### Basic User Profile

**Sample Input**

```json
{
  "id": 123,
  "username": "johndoe",
  "email": "john@example.com",
  "display_name?": "John Doe",
  "bio*": "Software developer",
  "avatar_url?*": null
}
```

**Generated Schema**

```json
{
  "type": "object",
  "properties": {
    "id": {
      "type": "number",
      "__field_required": true
    },
    "username": {
      "type": "string",
      "__field_required": true
    },
    "email": {
      "type": "string",
      "__field_required": true
    },
    "display_name": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": true
    },
    "bio": {
      "type": "string",
      "__field_required": false
    },
    "avatar_url": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": false
    }
  }
}
```

**Valid Data Examples**

```javascript
// All required fields present
{
  "id": 456,
  "username": "janedoe",
  "email": "jane@example.com",
  "display_name": "Jane Doe"
}

// Optional fields omitted
{
  "id": 789,
  "username": "bobsmith",
  "email": "bob@example.com",
  "display_name": null,
  "avatar_url": "https://example.com/avatar.jpg"
}
```

#### E-commerce Product

**Sample Input**

```json
{
  "product_id": "SKU-001",
  "name": "Wireless Headphones",
  "price": 99.99,
  "description?": "High-quality wireless headphones",
  "images": ["url1.jpg", "url2.jpg"],
  "categories*": ["electronics", "audio"],
  "metadata?*": {
    "weight": "200g",
    "color": "black"
  }
}
```

**Generated Schema Features**

* **Array Handling**: `images` array with string items
* **Nested Objects**: `metadata` object with its own properties
* **Mixed Requirements**: Required, optional, and nullable fields combined

#### API Response Wrapper

**Sample Input**

```json
{
  "success": true,
  "data": {
    "items": [],
    "total_count": 0
  },
  "error_message?*": null,
  "pagination*": {
    "page": 1,
    "per_page": 20,
    "total_pages": 1
  }
}
```

This example shows how to handle:

* **Conditional Fields**: Error messages only present on failures
* **Nested Required Objects**: Pagination is optional but structured when present
* **Empty Arrays**: Proper schema generation for empty collections

### Troubleshooting

#### Common Issues

**Issue: "Schema is null or undefined"**

**Cause**: Invalid JSON input or parsing failure **Solution**: Verify JSON syntax and structure

```javascript
// Check your JSON syntax
try {
    JSON.parse(yourSample);
} catch (e) {
    console.log("JSON parsing error:", e.message);
}
```

**Issue: "Unexpected property 'fieldName'"**

**Cause**: Data contains fields not defined in schema **Solutions**:

1. Add missing field to sample with appropriate annotation
2. Remove unexpected field from data
3. Use optional annotation if field may be present

**Issue: "Missing required property"**

**Cause**: Required field is missing from data **Solutions**:

1. Add the missing field to your data
2. Make field optional using `*` annotation
3. Make field nullable using `?` annotation if null is acceptable

**Issue: "Type mismatch" errors**

**Cause**: Data type doesn't match inferred schema type **Solutions**:

1. Ensure sample data represents actual expected types
2. Use union types by making field nullable if needed
3. Verify data transformation logic

#### Performance Troubleshooting

**Large Object Performance**

* **Issue**: Slow validation on large nested objects
* **Solution**: Consider flattening structure or validating in chunks

**Memory Usage**

* **Issue**: High memory usage with complex schemas
* **Solution**: Implement schema sharing and caching strategies

**Validation Speed**

* **Issue**: Slow validation performance
* **Solutions**:
  1. Cache validation results for identical data
  2. Implement early validation failure returns
  3. Optimize path string building

***

### Conclusion

This schema generation and validation system provides a powerful way to define and validate data structures using intuitive JSON samples. By understanding the annotation system and validation rules, you can create robust APIs with clear data contracts and comprehensive validation.

The system's flexibility in handling required, optional, and nullable fields makes it suitable for a wide range of use cases, from simple API responses to complex nested data structures. Regular use of the troubleshooting techniques will help you quickly resolve any validation issues and maintain high-quality data processing in your applications.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.workbird.io/custom-integrations/field-generation-and-schema-validation.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
