Field Generation & Schema Validation
Overview
This documentation covers our automatic schema generation system that converts JSON samples into validation schemas, and the subsequent field validation process. The system is designed to simplify API design by allowing developers to define data structures using annotated JSON samples rather than writing complex schema definitions manually.

Table of Contents
Schema Generation Process
How It Works
The schema generation process transforms annotated JSON samples into structured validation schemas through the following steps:
JSON Parsing: The input sample is parsed to extract the basic structure and data types
Annotation Processing: Field names are analyzed for special markers (
?
,*
, combinations)Type Inference: Data types are automatically detected from sample values
Schema Construction: A comprehensive schema object is built with validation rules
Metadata Addition: Additional properties like
__field_required
are added for validation logic
Core Algorithm
The system uses recursive parsing to handle nested objects and arrays:
const generateSchema = (obj) => {
if (Array.isArray(obj)) {
return { type: 'array', items: generateSchema(obj[0] || {}) };
}
if (obj === null) {
return { type: 'null' };
}
if (typeof obj === 'object') {
// Process object properties with annotation handling
return processObjectSchema(obj);
}
return { type: typeof obj };
};
Type Detection
The system automatically detects the following JavaScript types:
Primitive Types:
string
,number
,boolean
Complex Types:
object
,array
Special Values:
null
,undefined
Arrays are handled by analyzing the first element to determine the schema for all items in the array. Validation will fail if an array contains multiple types
Field Annotation System
Annotation Markers
Our system uses special suffix markers on field names to define validation behavior:
(none)
Standard field
✅ Yes
❌ No
"name": "John"
?
Nullable field
✅ Yes
✅ Yes
"nickname?": "Johnny"
*
Optional field
❌ No
❌ No
"metadata*": {...}
?*
or *?
Optional + Nullable
❌ No
✅ Yes
"notes?*": null
Annotation Processing Logic
The system processes annotations in the following priority order:
Combined annotations (
?*
or*?
) are checked firstSingle annotations (
?
or*
) are processed nextNo annotations default to required, non-nullable fields
Field Name Cleaning
After processing annotations, the system:
Removes suffix markers from field names
Stores the clean field name in the schema
Preserves the original validation requirements
Generated Schema Structure
Schema Properties
Each generated schema contains the following structure:
{
"type": "object",
"properties": {
"fieldName": {
"type": ["string", "null"],
"nullable": true,
"__field_required": false
}
}
}
Property Schema Fields
Each field schema includes:
type
: The expected data type(s) - can be a string or array of stringsnullable
: Boolean indicating if null values are allowed__field_required
: Custom property indicating if the field must be presentitems
: For arrays, contains the schema for array elements
Type Representations
Single Type:
"type": "string"
Union Types:
"type": ["string", "null"]
Arrays:
"type": "array"
with"items": {...}
Objects:
"type": "object"
with"properties": {...}
Field Validation Engine
Validation Process
The validation engine processes data through multiple phases:
Phase 1: Structural Validation
Checks for undefined data or schema
Validates basic data structure integrity
Phase 2: Type Validation
Compares actual data types against schema expectations
Handles union types (multiple allowed types)
Validates array and object structures
Phase 3: Field Presence Validation
Checks required fields are present
Allows optional fields to be missing
Validates field name casing
Phase 4: Recursive Validation
Validates nested objects and arrays
Maintains path information for error reporting
Handles deep object structures
Validation Rules
Required Field Rules
// Field is required if __field_required === true
if (fieldSchema.__field_required === true) {
// Field must be present in data
// Can be null only if schema allows it
}
Optional Field Rules
// Field is optional if __field_required === false
if (fieldSchema.__field_required === false) {
// Field can be completely missing from data
// If present, must match schema type requirements
}
Null Value Handling
// Null is allowed if any of these conditions are true:
- schema.nullable === true
- schema.type === 'null'
- Array.isArray(schema.type) && schema.type.includes('null')
Path Tracking
The validator maintains a path string for precise error location:
Root level:
""
or"root"
Object properties:
"user.name"
Array elements:
"items[0]"
Nested structures:
"user.addresses[0].street"
Error Handling & Messages
Error Types
The system provides detailed error messages for various scenarios:
Type Mismatch Errors
"Type mismatch at path 'user.age': expected number but got string"
Missing Required Field Errors
"Missing required property 'email' at path 'user'"
Unexpected Property Errors
"Unexpected property 'extra_field' at path 'user'"
Case Sensitivity Errors
"Key case mismatch at path 'user': expected 'firstName' but found 'firstname'"
Null Value Errors
"Null value not allowed at path 'user.name'"
Error Object Structure
{
valid: boolean, // Always false for errors
error: string // Descriptive error message with path
}
Success Response Structure
{
valid: true // No error property for successful validation
}
Best Practices
Schema Design
Use Clear Field Names: Choose descriptive names before adding annotations
// Good "user_email?": "[email protected]" // Avoid "e?": "[email protected]"
Consistent Annotation Usage: Apply the same patterns across your API
{ "required_field": "value", "optional_field*": "value", "nullable_field?": null, "flexible_field?*": "value or null or missing" }
Meaningful Sample Data: Use realistic sample values
// Good - shows expected format "created_at": "2024-01-15T10:30:00Z" // Avoid - unclear format "created_at": "some date"
Validation Integration
Early Validation: Validate data as early as possible in your pipeline
Error Propagation: Pass validation errors with full path information
Graceful Degradation: Handle validation failures appropriately
Performance Considerations
Schema Caching: Cache generated schemas to avoid repeated parsing
Validation Batching: Group related validations together
Path Optimization: Use efficient string building for deep paths
Examples & Use Cases
Basic User Profile
Sample Input
{
"id": 123,
"username": "johndoe",
"email": "[email protected]",
"display_name?": "John Doe",
"bio*": "Software developer",
"avatar_url?*": null
}
Generated Schema
{
"type": "object",
"properties": {
"id": {
"type": "number",
"__field_required": true
},
"username": {
"type": "string",
"__field_required": true
},
"email": {
"type": "string",
"__field_required": true
},
"display_name": {
"type": ["string", "null"],
"nullable": true,
"__field_required": true
},
"bio": {
"type": "string",
"__field_required": false
},
"avatar_url": {
"type": ["string", "null"],
"nullable": true,
"__field_required": false
}
}
}
Valid Data Examples
// All required fields present
{
"id": 456,
"username": "janedoe",
"email": "[email protected]",
"display_name": "Jane Doe"
}
// Optional fields omitted
{
"id": 789,
"username": "bobsmith",
"email": "[email protected]",
"display_name": null,
"avatar_url": "https://example.com/avatar.jpg"
}
E-commerce Product
Sample Input
{
"product_id": "SKU-001",
"name": "Wireless Headphones",
"price": 99.99,
"description?": "High-quality wireless headphones",
"images": ["url1.jpg", "url2.jpg"],
"categories*": ["electronics", "audio"],
"metadata?*": {
"weight": "200g",
"color": "black"
}
}
Generated Schema Features
Array Handling:
images
array with string itemsNested Objects:
metadata
object with its own propertiesMixed Requirements: Required, optional, and nullable fields combined
API Response Wrapper
Sample Input
{
"success": true,
"data": {
"items": [],
"total_count": 0
},
"error_message?*": null,
"pagination*": {
"page": 1,
"per_page": 20,
"total_pages": 1
}
}
This example shows how to handle:
Conditional Fields: Error messages only present on failures
Nested Required Objects: Pagination is optional but structured when present
Empty Arrays: Proper schema generation for empty collections
Troubleshooting
Common Issues
Issue: "Schema is null or undefined"
Cause: Invalid JSON input or parsing failure Solution: Verify JSON syntax and structure
// Check your JSON syntax
try {
JSON.parse(yourSample);
} catch (e) {
console.log("JSON parsing error:", e.message);
}
Issue: "Unexpected property 'fieldName'"
Cause: Data contains fields not defined in schema Solutions:
Add missing field to sample with appropriate annotation
Remove unexpected field from data
Use optional annotation if field may be present
Issue: "Missing required property"
Cause: Required field is missing from data Solutions:
Add the missing field to your data
Make field optional using
*
annotationMake field nullable using
?
annotation if null is acceptable
Issue: "Type mismatch" errors
Cause: Data type doesn't match inferred schema type Solutions:
Ensure sample data represents actual expected types
Use union types by making field nullable if needed
Verify data transformation logic
Performance Troubleshooting
Large Object Performance
Issue: Slow validation on large nested objects
Solution: Consider flattening structure or validating in chunks
Memory Usage
Issue: High memory usage with complex schemas
Solution: Implement schema sharing and caching strategies
Validation Speed
Issue: Slow validation performance
Solutions:
Cache validation results for identical data
Implement early validation failure returns
Optimize path string building
Conclusion
This schema generation and validation system provides a powerful way to define and validate data structures using intuitive JSON samples. By understanding the annotation system and validation rules, you can create robust APIs with clear data contracts and comprehensive validation.
The system's flexibility in handling required, optional, and nullable fields makes it suitable for a wide range of use cases, from simple API responses to complex nested data structures. Regular use of the troubleshooting techniques will help you quickly resolve any validation issues and maintain high-quality data processing in your applications.
Last updated