Field Generation & Schema Validation

Overview

This documentation covers our automatic schema generation system that converts JSON samples into validation schemas, and the subsequent field validation process. The system is designed to simplify API design by allowing developers to define data structures using annotated JSON samples rather than writing complex schema definitions manually.

Schema Generation Process

How It Works

The schema generation process transforms annotated JSON samples into structured validation schemas through the following steps:

JSON Parsing: The input sample is parsed to extract the basic structure and data types
Annotation Processing: Field names are analyzed for special markers (?, *, combinations)
Type Inference: Data types are automatically detected from sample values
Schema Construction: A comprehensive schema object is built with validation rules
Metadata Addition: Additional properties like __field_required are added for validation logic

Core Algorithm

The system uses recursive parsing to handle nested objects and arrays:

const generateSchema = (obj) => {
    if (Array.isArray(obj)) {
        return { type: 'array', items: generateSchema(obj[0] || {}) };
    }
    
    if (obj === null) {
        return { type: 'null' };
    }
    
    if (typeof obj === 'object') {
        // Process object properties with annotation handling
        return processObjectSchema(obj);
    }
    
    return { type: typeof obj };
};

Type Detection

The system automatically detects the following JavaScript types:

Primitive Types: string, number, boolean
Complex Types: object, array
Special Values: null, undefined

Arrays are handled by analyzing the first element to determine the schema for all items in the array. Validation will fail if an array contains multiple types

Field Annotation System

Annotation Markers

Our system uses special suffix markers on field names to define validation behavior:

Marker

Meaning

Required

Nullable

Example

(none)

Standard field

✅ Yes

❌ No

"name": "John"

?

Nullable field

✅ Yes

"nickname?": "Johnny"

*

Optional field

❌ No

"metadata*": {...}

?* or *?

Optional + Nullable

❌ No

✅ Yes

"notes?*": null

Annotation Processing Logic

The system processes annotations in the following priority order:

Combined annotations (?* or *?) are checked first
Single annotations (? or *) are processed next
No annotations default to required, non-nullable fields

Field Name Cleaning

After processing annotations, the system:

Removes suffix markers from field names
Stores the clean field name in the schema
Preserves the original validation requirements

Generated Schema Structure

Schema Properties

Each generated schema contains the following structure:

{
  "type": "object",
  "properties": {
    "fieldName": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": false
    }
  }
}

Property Schema Fields

Each field schema includes:

type: The expected data type(s) - can be a string or array of strings
nullable: Boolean indicating if null values are allowed
__field_required: Custom property indicating if the field must be present
items: For arrays, contains the schema for array elements

Type Representations

Single Type: "type": "string"
Union Types: "type": ["string", "null"]
Arrays: "type": "array" with "items": {...}
Objects: "type": "object" with "properties": {...}

Field Validation Engine

Validation Process

The validation engine processes data through multiple phases:

Phase 1: Structural Validation

Checks for undefined data or schema
Validates basic data structure integrity

Phase 2: Type Validation

Compares actual data types against schema expectations
Handles union types (multiple allowed types)
Validates array and object structures

Phase 3: Field Presence Validation

Checks required fields are present
Allows optional fields to be missing
Validates field name casing

Phase 4: Recursive Validation

Validates nested objects and arrays
Maintains path information for error reporting
Handles deep object structures

Validation Rules

Required Field Rules

// Field is required if __field_required === true
if (fieldSchema.__field_required === true) {
    // Field must be present in data
    // Can be null only if schema allows it
}

Optional Field Rules

// Field is optional if __field_required === false
if (fieldSchema.__field_required === false) {
    // Field can be completely missing from data
    // If present, must match schema type requirements
}

Null Value Handling

// Null is allowed if any of these conditions are true:
- schema.nullable === true
- schema.type === 'null'
- Array.isArray(schema.type) && schema.type.includes('null')

Path Tracking

The validator maintains a path string for precise error location:

Root level: "" or "root"
Object properties: "user.name"
Array elements: "items[0]"
Nested structures: "user.addresses[0].street"

Error Handling & Messages

Error Types

The system provides detailed error messages for various scenarios:

Type Mismatch Errors

"Type mismatch at path 'user.age': expected number but got string"

Missing Required Field Errors

"Missing required property 'email' at path 'user'"

Unexpected Property Errors

"Unexpected property 'extra_field' at path 'user'"

Case Sensitivity Errors

"Key case mismatch at path 'user': expected 'firstName' but found 'firstname'"

Null Value Errors

"Null value not allowed at path 'user.name'"

Error Object Structure

{
    valid: boolean,     // Always false for errors
    error: string       // Descriptive error message with path
}

Success Response Structure

{
    valid: true        // No error property for successful validation
}

Best Practices

Schema Design

Use Clear Field Names: Choose descriptive names before adding annotations

// Good
"user_email?": "[email protected]"

// Avoid
"e?": "[email protected]"

Consistent Annotation Usage: Apply the same patterns across your API

{
  "required_field": "value",
  "optional_field*": "value",
  "nullable_field?": null,
  "flexible_field?*": "value or null or missing"
}

Meaningful Sample Data: Use realistic sample values

// Good - shows expected format
"created_at": "2024-01-15T10:30:00Z"

// Avoid - unclear format
"created_at": "some date"

Validation Integration

Early Validation: Validate data as early as possible in your pipeline
Error Propagation: Pass validation errors with full path information
Graceful Degradation: Handle validation failures appropriately

Performance Considerations

Schema Caching: Cache generated schemas to avoid repeated parsing
Validation Batching: Group related validations together
Path Optimization: Use efficient string building for deep paths

Examples & Use Cases

Basic User Profile

Sample Input

{
  "id": 123,
  "username": "johndoe",
  "email": "[email protected]",
  "display_name?": "John Doe",
  "bio*": "Software developer",
  "avatar_url?*": null
}

Generated Schema

{
  "type": "object",
  "properties": {
    "id": {
      "type": "number",
      "__field_required": true
    },
    "username": {
      "type": "string",
      "__field_required": true
    },
    "email": {
      "type": "string",
      "__field_required": true
    },
    "display_name": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": true
    },
    "bio": {
      "type": "string",
      "__field_required": false
    },
    "avatar_url": {
      "type": ["string", "null"],
      "nullable": true,
      "__field_required": false
    }
  }
}

Valid Data Examples

// All required fields present
{
  "id": 456,
  "username": "janedoe",
  "email": "[email protected]",
  "display_name": "Jane Doe"
}

// Optional fields omitted
{
  "id": 789,
  "username": "bobsmith",
  "email": "[email protected]",
  "display_name": null,
  "avatar_url": "https://example.com/avatar.jpg"
}

E-commerce Product

Sample Input

{
  "product_id": "SKU-001",
  "name": "Wireless Headphones",
  "price": 99.99,
  "description?": "High-quality wireless headphones",
  "images": ["url1.jpg", "url2.jpg"],
  "categories*": ["electronics", "audio"],
  "metadata?*": {
    "weight": "200g",
    "color": "black"
  }
}

Generated Schema Features

Array Handling: images array with string items
Nested Objects: metadata object with its own properties
Mixed Requirements: Required, optional, and nullable fields combined

API Response Wrapper

Sample Input

{
  "success": true,
  "data": {
    "items": [],
    "total_count": 0
  },
  "error_message?*": null,
  "pagination*": {
    "page": 1,
    "per_page": 20,
    "total_pages": 1
  }
}

This example shows how to handle:

Conditional Fields: Error messages only present on failures
Nested Required Objects: Pagination is optional but structured when present
Empty Arrays: Proper schema generation for empty collections

Troubleshooting

Common Issues

Issue: "Schema is null or undefined"

Cause: Invalid JSON input or parsing failure Solution: Verify JSON syntax and structure

// Check your JSON syntax
try {
    JSON.parse(yourSample);
} catch (e) {
    console.log("JSON parsing error:", e.message);
}

Issue: "Unexpected property 'fieldName'"

Cause: Data contains fields not defined in schema Solutions:

Add missing field to sample with appropriate annotation
Remove unexpected field from data
Use optional annotation if field may be present

Issue: "Missing required property"

Cause: Required field is missing from data Solutions:

Add the missing field to your data
Make field optional using * annotation
Make field nullable using ? annotation if null is acceptable

Issue: "Type mismatch" errors

Cause: Data type doesn't match inferred schema type Solutions:

Ensure sample data represents actual expected types
Use union types by making field nullable if needed
Verify data transformation logic

Performance Troubleshooting

Large Object Performance

Issue: Slow validation on large nested objects
Solution: Consider flattening structure or validating in chunks

Memory Usage

Issue: High memory usage with complex schemas
Solution: Implement schema sharing and caching strategies

Validation Speed

Issue: Slow validation performance
Solutions:
1. Cache validation results for identical data
2. Implement early validation failure returns
3. Optimize path string building

Conclusion

This schema generation and validation system provides a powerful way to define and validate data structures using intuitive JSON samples. By understanding the annotation system and validation rules, you can create robust APIs with clear data contracts and comprehensive validation.

The system's flexibility in handling required, optional, and nullable fields makes it suitable for a wide range of use cases, from simple API responses to complex nested data structures. Regular use of the troubleshooting techniques will help you quickly resolve any validation issues and maintain high-quality data processing in your applications.

PreviousCreating Custom Apps

Last updated 3 months ago

Good morning

Field Generation & Schema Validation

Overview

Table of Contents

Schema Generation Process

How It Works

Core Algorithm

Type Detection

Field Annotation System

Annotation Markers

Annotation Processing Logic

Field Name Cleaning

Generated Schema Structure

Schema Properties

Property Schema Fields

Type Representations

Field Validation Engine

Validation Process

Validation Rules

Path Tracking

Error Handling & Messages

Error Types

Error Object Structure

Success Response Structure

Best Practices

Schema Design

Validation Integration

Performance Considerations

Examples & Use Cases

Basic User Profile

E-commerce Product

API Response Wrapper

Troubleshooting

Common Issues

Performance Troubleshooting

Conclusion