Start Configuration

🎒Understanding the Start Configuration

The Start Configuration is always the beginning in your Creao pipeline. It imports external data (e.g., CSV files or submitted content) into your workflow and declares the global parameters. This component forms the basis for creating synthetic data or preparing information for evaluation and model fine-tuning.

🚀 Key Features

Data Import: Loads datasets from CSV files or pipeline-submitted content.
Preprocessing: Prepares data for use in later pipeline stages.
AI Workflow Initialization: Initiates data generation and processing in the pipeline.

📘 How to Use the Start Configuration

The Start Configuration comes pre-included in your pipeline. It handles the crucial task of loading and integrating your input dataset, setting the stage for further processing.

<aside> ❗

Each pipeline can currently be configured with only one dataset.

</aside>

🗄️ Configuring the Input Data

Accessing the Component: It is automatically added to your pipeline.
Configuring the Input Data: Click on the "Start" component, which supports two parameters Global Variables and Input Field.
- Global Variables are constant values that you can repeatedly use across all components.
- Input Field are dataset placeholders that ******you can repeatedly use across all components.
  
  <aside> 📌
  
  For any Input Field you define, please ensure to upload the corresponding dataset with the associated column names. (Input Field ↔ Column Names)
  
  </aside>
Uploading the Dataset: When setting up a pipeline, you can easily upload a local CSV file as the input dataset.

Once again, for each Input Field you have defined, make sure there is a corresponding column with matching name in the uploaded dataset.

📖 Scenarios

Example Pipeline: Question Generation ❓

When evaluating a Retrieval-Augmented Generation (RAG) system or enhancing its retrieval capability, you need a dataset containing chunks of text and related questions. By creating questions that users might ask about a document, we can efficiently generate such datasets at minimal cost. The figure below illustrates a segment of a pipeline designed to generate questions from a document. In this process, the Start Configuration is crucial—it loads the document into the pipeline and declares the global parameters. This initial step lays the foundation for subsequent processing and question generation.