🎒Understanding the Start Configuration

The Start Configuration is always the beginning in your Creao pipeline. It imports external data (e.g., CSV files or submitted content) into your workflow and declares the global parameters. This component forms the basis for creating synthetic data or preparing information for evaluation and model fine-tuning.

🚀 Key Features


📘 How to Use the Start Configuration

The Start Configuration comes pre-included in your pipeline. It handles the crucial task of loading and integrating your input dataset, setting the stage for further processing.

<aside> ❗

Each pipeline can currently be configured with only one dataset.

</aside>

🗄️ Configuring the Input Data

  1. Accessing the Component: It is automatically added to your pipeline.

  2. Configuring the Input Data: Click on the "Start" component, which supports two parameters Global Variables and Input Field.

  3. Uploading the Dataset: When setting up a pipeline, you can easily upload a local CSV file as the input dataset.

    image.png

    Once again, for each Input Field you have defined, make sure there is a corresponding column with matching name in the uploaded dataset.

    image.png


📖 Scenarios

Example Pipeline: Question Generation

When evaluating a Retrieval-Augmented Generation (RAG) system or enhancing its retrieval capability, you need a dataset containing chunks of text and related questions. By creating questions that users might ask about a document, we can efficiently generate such datasets at minimal cost. The figure below illustrates a segment of a pipeline designed to generate questions from a document. In this process, the Start Configuration is crucial—it loads the document into the pipeline and declares the global parameters. This initial step lays the foundation for subsequent processing and question generation.

image.png