Field Generation & Schema Validation

Overview

This documentation covers our automatic schema generation system that converts JSON samples into validation schemas, and the subsequent field validation process. The system is designed to simplify API design by allowing developers to define data structures using annotated JSON samples rather than writing complex schema definitions manually.

Table of Contents

Schema Generation Process

How It Works

The schema generation process transforms annotated JSON samples into structured validation schemas through the following steps:

  1. JSON Parsing: The input sample is parsed to extract the basic structure and data types

  2. Annotation Processing: Field names are analyzed for special markers (?, *, combinations)

  3. Type Inference: Data types are automatically detected from sample values

  4. Schema Construction: A comprehensive schema object is built with validation rules

  5. Metadata Addition: Additional properties like __field_required are added for validation logic

Core Algorithm

The system uses recursive parsing to handle nested objects and arrays:

const generateSchema = (obj) => {
    if (Array.isArray(obj)) {
        return { type: 'array', items: generateSchema(obj[0] || {}) };
    }
    
    if (obj === null) {
        return { type: 'null' };
    }
    
    if (typeof obj === 'object') {
        // Process object properties with annotation handling
        return processObjectSchema(obj);
    }
    
    return { type: typeof obj };
};

Type Detection

The system automatically detects the following JavaScript types:

  • Primitive Types: string, number, boolean

  • Complex Types: object, array

  • Special Values: null, undefined

Arrays are handled by analyzing the first element to determine the schema for all items in the array. Validation will fail if an array contains multiple types

Field Annotation System

Annotation Markers

Our system uses special suffix markers on field names to define validation behavior:

Marker
Meaning
Required
Nullable
Example

(none)

Standard field

✅ Yes

❌ No

"name": "John"

?

Nullable field

✅ Yes

✅ Yes

"nickname?": "Johnny"

*

Optional field

❌ No

❌ No

"metadata*": {...}

?* or *?

Optional + Nullable

❌ No

✅ Yes

"notes?*": null

Annotation Processing Logic

The system processes annotations in the following priority order:

  1. Combined annotations (?* or *?) are checked first

  2. Single annotations (? or *) are processed next

  3. No annotations default to required, non-nullable fields

Field Name Cleaning

After processing annotations, the system:

  1. Removes suffix markers from field names

  2. Stores the clean field name in the schema

  3. Preserves the original validation requirements

Generated Schema Structure

Schema Properties

Each generated schema contains the following structure:

{
  "type": "object",
  "properties": {
    "fieldName": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": false
    }
  }
}

Property Schema Fields

Each field schema includes:

  • type: The expected data type(s) - can be a string or array of strings

  • nullable: Boolean indicating if null values are allowed

  • __field_required: Custom property indicating if the field must be present

  • items: For arrays, contains the schema for array elements

Type Representations

  • Single Type: "type": "string"

  • Union Types: "type": ["string", "null"]

  • Arrays: "type": "array" with "items": {...}

  • Objects: "type": "object" with "properties": {...}

Field Validation Engine

Validation Process

The validation engine processes data through multiple phases:

Phase 1: Structural Validation

  • Checks for undefined data or schema

  • Validates basic data structure integrity

Phase 2: Type Validation

  • Compares actual data types against schema expectations

  • Handles union types (multiple allowed types)

  • Validates array and object structures

Phase 3: Field Presence Validation

  • Checks required fields are present

  • Allows optional fields to be missing

  • Validates field name casing

Phase 4: Recursive Validation

  • Validates nested objects and arrays

  • Maintains path information for error reporting

  • Handles deep object structures

Validation Rules

Required Field Rules

// Field is required if __field_required === true
if (fieldSchema.__field_required === true) {
    // Field must be present in data
    // Can be null only if schema allows it
}

Optional Field Rules

// Field is optional if __field_required === false
if (fieldSchema.__field_required === false) {
    // Field can be completely missing from data
    // If present, must match schema type requirements
}

Null Value Handling

// Null is allowed if any of these conditions are true:
- schema.nullable === true
- schema.type === 'null'
- Array.isArray(schema.type) && schema.type.includes('null')

Path Tracking

The validator maintains a path string for precise error location:

  • Root level: "" or "root"

  • Object properties: "user.name"

  • Array elements: "items[0]"

  • Nested structures: "user.addresses[0].street"

Error Handling & Messages

Error Types

The system provides detailed error messages for various scenarios:

Type Mismatch Errors

"Type mismatch at path 'user.age': expected number but got string"

Missing Required Field Errors

"Missing required property 'email' at path 'user'"

Unexpected Property Errors

"Unexpected property 'extra_field' at path 'user'"

Case Sensitivity Errors

"Key case mismatch at path 'user': expected 'firstName' but found 'firstname'"

Null Value Errors

"Null value not allowed at path 'user.name'"

Error Object Structure

{
    valid: boolean,     // Always false for errors
    error: string       // Descriptive error message with path
}

Success Response Structure

{
    valid: true        // No error property for successful validation
}

Best Practices

Schema Design

  1. Use Clear Field Names: Choose descriptive names before adding annotations

    // Good
    "user_email?": "[email protected]"
    
    // Avoid
    "e?": "[email protected]"
  2. Consistent Annotation Usage: Apply the same patterns across your API

    {
      "required_field": "value",
      "optional_field*": "value",
      "nullable_field?": null,
      "flexible_field?*": "value or null or missing"
    }
  3. Meaningful Sample Data: Use realistic sample values

    // Good - shows expected format
    "created_at": "2024-01-15T10:30:00Z"
    
    // Avoid - unclear format
    "created_at": "some date"

Validation Integration

  1. Early Validation: Validate data as early as possible in your pipeline

  2. Error Propagation: Pass validation errors with full path information

  3. Graceful Degradation: Handle validation failures appropriately

Performance Considerations

  1. Schema Caching: Cache generated schemas to avoid repeated parsing

  2. Validation Batching: Group related validations together

  3. Path Optimization: Use efficient string building for deep paths

Examples & Use Cases

Basic User Profile

Sample Input

{
  "id": 123,
  "username": "johndoe",
  "email": "[email protected]",
  "display_name?": "John Doe",
  "bio*": "Software developer",
  "avatar_url?*": null
}

Generated Schema

{
  "type": "object",
  "properties": {
    "id": {
      "type": "number",
      "__field_required": true
    },
    "username": {
      "type": "string",
      "__field_required": true
    },
    "email": {
      "type": "string",
      "__field_required": true
    },
    "display_name": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": true
    },
    "bio": {
      "type": "string",
      "__field_required": false
    },
    "avatar_url": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": false
    }
  }
}

Valid Data Examples

// All required fields present
{
  "id": 456,
  "username": "janedoe",
  "email": "[email protected]",
  "display_name": "Jane Doe"
}

// Optional fields omitted
{
  "id": 789,
  "username": "bobsmith",
  "email": "[email protected]",
  "display_name": null,
  "avatar_url": "https://example.com/avatar.jpg"
}

E-commerce Product

Sample Input

{
  "product_id": "SKU-001",
  "name": "Wireless Headphones",
  "price": 99.99,
  "description?": "High-quality wireless headphones",
  "images": ["url1.jpg", "url2.jpg"],
  "categories*": ["electronics", "audio"],
  "metadata?*": {
    "weight": "200g",
    "color": "black"
  }
}

Generated Schema Features

  • Array Handling: images array with string items

  • Nested Objects: metadata object with its own properties

  • Mixed Requirements: Required, optional, and nullable fields combined

API Response Wrapper

Sample Input

{
  "success": true,
  "data": {
    "items": [],
    "total_count": 0
  },
  "error_message?*": null,
  "pagination*": {
    "page": 1,
    "per_page": 20,
    "total_pages": 1
  }
}

This example shows how to handle:

  • Conditional Fields: Error messages only present on failures

  • Nested Required Objects: Pagination is optional but structured when present

  • Empty Arrays: Proper schema generation for empty collections

Troubleshooting

Common Issues

Issue: "Schema is null or undefined"

Cause: Invalid JSON input or parsing failure Solution: Verify JSON syntax and structure

// Check your JSON syntax
try {
    JSON.parse(yourSample);
} catch (e) {
    console.log("JSON parsing error:", e.message);
}

Issue: "Unexpected property 'fieldName'"

Cause: Data contains fields not defined in schema Solutions:

  1. Add missing field to sample with appropriate annotation

  2. Remove unexpected field from data

  3. Use optional annotation if field may be present

Issue: "Missing required property"

Cause: Required field is missing from data Solutions:

  1. Add the missing field to your data

  2. Make field optional using * annotation

  3. Make field nullable using ? annotation if null is acceptable

Issue: "Type mismatch" errors

Cause: Data type doesn't match inferred schema type Solutions:

  1. Ensure sample data represents actual expected types

  2. Use union types by making field nullable if needed

  3. Verify data transformation logic

Performance Troubleshooting

Large Object Performance

  • Issue: Slow validation on large nested objects

  • Solution: Consider flattening structure or validating in chunks

Memory Usage

  • Issue: High memory usage with complex schemas

  • Solution: Implement schema sharing and caching strategies

Validation Speed

  • Issue: Slow validation performance

  • Solutions:

    1. Cache validation results for identical data

    2. Implement early validation failure returns

    3. Optimize path string building


Conclusion

This schema generation and validation system provides a powerful way to define and validate data structures using intuitive JSON samples. By understanding the annotation system and validation rules, you can create robust APIs with clear data contracts and comprehensive validation.

The system's flexibility in handling required, optional, and nullable fields makes it suitable for a wide range of use cases, from simple API responses to complex nested data structures. Regular use of the troubleshooting techniques will help you quickly resolve any validation issues and maintain high-quality data processing in your applications.

Last updated