Contineo Platform

Get JSON from string

The Get JSON from String component processes a given string by removing unnecessary decorations and converting it into either a JSON format or an array. Users specify the input string and desired output format to ensure structured and machine-readable data transformation.

Overview

The Get JSON from String component transforms string representations of JSON into actual JSON objects or arrays. It addresses a common challenge in GenAI pipelines: when an LLM generates JSON output, it is returned as a text string (often with code markers such as ``` JSON) rather than an actual JSON object. This component bridges this gap by cleaning and parsing the string, allowing you to directly access JSON properties in subsequent pipeline steps. It handles various input formats, and code block markers, and can extract nested JSON structures for seamless integration in your Pipeline Builder workflows.

Converts LLM-generated JSON strings into properly formatted JSON objects or arrays by cleaning up formatting artifacts, code block markers, and syntax inconsistencies that commonly occur in Large Language Model responses. This component addresses the critical challenge in AI pipelines where LLMs generate JSON content as text strings wrapped in markdown code blocks (json,, json) rather than actual JSON objects, making them unusable for downstream processing. The component intelligently removes decorative formatting, converts Python-style boolean values (True/False) to JSON standard (true/false), and handles nested JSON structures while preserving data integrity.

CRITICAL USAGE RULE: Use this component exclusively after the Generate Text components when requesting structured JSON or array outputs from LLMs - it is NOT needed for plain text responses and should only be used when structured data manipulation is required. The component processes input strings by first removing common code block markers (json,, json) that LLMs frequently add to JSON responses, then standardizing boolean representations to ensure JSON compliance, followed by recursive parsing of nested JSON strings within the structure. The output_format parameter determines result structure: 'json' returns the parsed data as-is (preserving original structure), while 'array' specifically extracts array values - if input is already an array it returns directly, if it's an object containing arrays it extracts the first array value found, and raises an error if no array structure exists. Input configuration includes string_to_be_converted accepting LLM response strings from session variables (typically output from Generate Text components that requested structured data), and output_format as hardcoded value choosing between 'json' for object preservation or 'array' for array extraction. The component handles complex scenarios including deeply nested JSON structures, mixed data types within arrays, escaped characters, and malformed JSON recovery through intelligent parsing algorithms. Integration patterns include post-LLM structured output processing (Generate Text with JSON prompt → Get JSON from String → Use structured data), array extraction for iteration workflows (Generate Text with array request → Get JSON from String with array format → Loop processing), and data validation pipelines where JSON structure needs verification before downstream consumption.

Advanced error handling provides detailed diagnostics for malformed JSON, unsupported format conversions, and parsing failures while maintaining system stability. The component is essential for reliable structured data workflows, reducing parsing errors, and ensuring consistent JSON format compliance in AI-driven automation pipelines.

How to use:

Key Terms

Term	Definition
JSON	JavaScript Object Notation is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate.
Code Block Markers	Delimiters like ```json or ``` that often wrap JSON content in markdown or text outputs from LLMs.
Nested JSON	JSON objects or arrays that contain other JSON objects or arrays as values.

When to Use

Use when you need to process string outputs from LLMs that contain JSON data.
Ideal for cleaning up JSON strings that have code block markers or formatting characters.
Helpful when you need to extract JSON from text content for further processing.
Use when converting between different JSON representations (object or array).

Component Configuration

Required Input

Input	Description	Data Type	Example
String to convert into JSON	The string representation of JSON that needs to be converted into a valid JSON object or array. The component automatically handle strings with code block markers (`json,`, json) and clean them up.	String	"```json\n{\n \"name\": \"Example\",\n \"properties\": {\n \"id\": 123,\n \"active\": true\n }\n}\n```"
Output Format	Specifies the desired format of the output. Choose between "json" or "array".	Json, Array	`json`

How It Works

The component takes the input string and removes any code block markers (json,, json) if present.
It processes boolean values by replacing string representations like "True"/"False" with proper JSON boolean values (true/false).
The component then parses the cleaned string into a JSON object.
For nested JSON strings within the structure, it recursively parses them as well.
Based on the specified output format:
- If "json" is selected, it returns the parsed JSON object as is.
- If "array" is selected, it returns the parsed data as an array. If the input is already an array, it returns it directly; if it's an object with an array value, it extracts and returns that array.

Example Use Case

Scenario: Processing LLM-generated component metadata

Configuration:

String to convert into JSON: "```json\n{\n \"collection_id\": \"contineo_widget_collection\",\n \"document_column_names\": [\n \"component_name\", \"component_id\", \n \"component_type\", \"component_definition\", \n \"description\", \"created_on\", \n \"last_updated_on\", \"properties\", \n \"version\", \"version_of_id\", \n \"category\"\n ]\n}\n```"
Output Format: json

Process:

The component removes the code block markers ```json and ```
It processes the string and converts it into a proper JSON object.
Since the output format is set to "JSON", it returns the complete object.

Output:

{"collection_id":"contineo_widget_collection","document_column_names":["component_name","component_id","component_type","component_definition","description","created_on","last_updated_on","properties","version","version_of_id","category"]}

Example in Pipeline Builder:

Output Format

The output is a properly formatted JSON object or array, depending on the selected output format and the structure of the input data.

For "JSON" output format:

Input (with json markers): `"json\n{\n "name": "Dashboard Widget",\n "id": 123,\n "settings": {\n "width": 300,\n "height": 200,\n "visible": true\n }\n}\n```"`

Output:

{"name":"Dashboard Widget","id":123,"settings":{"width":300,"height":200,"visible":true}}

For "array" output format:

Input (plain string with newlines): "{\n \"components\": [\n {\n \"type\": \"button\",\n \"label\": \"Submit\"\n },\n {\n \"type\": \"input\",\n \"label\": \"Email\"\n },\n {\n \"type\": \"checkbox\",\n \"label\": \"Remember me\"\n }\n ]\n}"

Output:

[{"type":"button","label":"Submit"},{"type":"input","label":"Email"},{"type":"checkbox","label":"Remember me"}]

Best Practices

If you are extracting JSON from LLM outputs, consider using this component immediately after your LLM component to ensure clean JSON for downstream processing.
When in doubt about the structure of your data, use the "JSON" output format first to inspect the complete object before deciding to extract arrays.
Remember that this component handles nested JSON automatically, so there is no need to perform multiple passes for complex structures.

Troubleshooting

Issue	Possible Cause	Solution
"Error converting string to json/array"	Malformed JSON in the input string	Check for missing commas, brackets, or quotes in your JSON input. Validate your JSON using a JSON validator tool. Ensure that all string properties and values are properly quoted.
"Cannot convert to array: Input is not an array or does not contain an 'array' key"	Trying to convert a non-array JSON object to an array format	Verify that your input contains an array structure when using the "array" output format. Try using the "JSON" output format instead if your data is an object. Check if your object has any property that contains an array value.
"Error occurred while loading the JSON from input string"	The input might contain special characters or invalid syntax.	Check for special characters that might need to escape. Ensure that boolean values are properly formatted (true/false, not True/False). Look for trailing commas, which are not allowed in JSON.

Limitations and Considerations

Format restrictions - The component only supports "JSON" and "array" as output formats.
Automatic conversion - The component automatically converts Python-style boolean values (True/False) to JSON format (true/false), but other Python-specific syntax might cause errors.
Error handling - The component returns detailed error messages, but it cannot repair severely malformed JSON.