Input and output schemas
Overview
Sandgarden workflows consist of steps executed in sequence, so it is very important to ensure that each step is receiving the expected input from the step before. That's where input/output schemas come in: they let each step specify the specific data they require (input) and emit (output). While optional, input and output schemas are highly recommended for productionized workflows to add resilience and to let the workflow catch any errors immediately.
Format
Input and output schemas are defined the same way, via JSON structures using the JSON Schema standard. A very simple schema might be a single type
:
{
"type": "string"
}
More likely, however, a schema will define a full object:
{
"type": "object",
"properties": {
"a_number": {
"type": "number",
"description": "This is a_number, which is a number"
},
"a_string": {
"type": "string",
"description": "This is a_string, which is a string"
}
},
"required": ["a_number", "a_string"]
}
Example
The following is an example of an input schema. Three properties are defined, but only one is required (see the highlighted line).
{
"type": "object",
"properties": {
"question": {
"type": "string",
"description": "The question to put to the LLM"
},
"lie": {
"type": "boolean",
"description": "Whether or not the LLM should lie in its response"
},
"lie_count": {
"type": "integer",
"description": "If `lie` is true, how many lies to put into the response"
}
},
"required": ["question"]
}
Applying schemas
By convention, input schemas are called input.json
and output schemas are in output.json
, both located in the same directory as the step code. They are applied to a specific step or workflow when it is pushed, as follows:
sand steps push --name sampleStep --file step.py --inputSchema input.json --outputSchema output.json
Schemas can also be specified inline (e.g. --inputSchema '{json-here}'
), but it's a best practice to save them in separate files so they can be tracked and managed in version control.
When a schema is applied, it will be enforced at runtime. If the data passed to a step or workflow doesn't match its input schema, the run will fail with a schema mismatch error. Similarly, if the data that is returned by a step or workflow does not match its output schema, the run will fail with an error. By enforcing input/output schemas, longer and more complex workflows can be built with the confidence that a data error early in the workflow will not lead to cascading issues further down the line.