The Start Configuration is always the beginning in your Creao pipeline. It imports external data (e.g., CSV files or submitted content) into your workflow and declares the global parameters. This component forms the basis for creating synthetic data or preparing information for evaluation and model fine-tuning.
The Start Configuration comes pre-included in your pipeline. It handles the crucial task of loading and integrating your input dataset, setting the stage for further processing.
<aside> ❗
Each pipeline can currently be configured with only one dataset.
</aside>
Accessing the Component: It is automatically added to your pipeline.
Configuring the Input Data: Click on the "Start" component, which supports two parameters Global Variables and Input Field.
Global Variables are constant values that you can repeatedly use across all components.
Input Field are dataset placeholders that ******you can repeatedly use across all components.
<aside> 📌
For any Input Field you define, please ensure to upload the corresponding dataset with the associated column names. (Input Field ↔ Column Names)
</aside>
Uploading the Dataset: When setting up a pipeline, you can easily upload a local CSV file as the input dataset.
Once again, for each Input Field you have defined, make sure there is a corresponding column with matching name in the uploaded dataset.
When evaluating a Retrieval-Augmented Generation (RAG) system or enhancing its retrieval capability, you need a dataset containing chunks of text and related questions. By creating questions that users might ask about a document, we can efficiently generate such datasets at minimal cost. The figure below illustrates a segment of a pipeline designed to generate questions from a document. In this process, the Start Configuration is crucial—it loads the document into the pipeline and declares the global parameters. This initial step lays the foundation for subsequent processing and question generation.