Quickstart#
CLI toolkit for evaluating Gemini Enterprise Connector RAG pipelines. Compares actual responses against a golden dataset, producing pass/fail grades, root cause analysis, and an interactive HTML dashboard.
Installation#
Using uv (recommended)#
Using pip#
From source#
git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync
Quick Start#
# Step 1: Scaffold a working directory with sample inputs
ge-eval init
# Step 2: Edit .env with your Google Cloud project settings
# Step 3: Validate configuration
ge-eval check
# Step 4: Query the API (sends questions, gets responses)
ge-eval run
# Step 5: Run LLM-judge evaluation
ge-eval eval
# Step 6: View results in the browser
ge-eval serve
# Then open http://localhost:8080
Tip
Run ge-eval init to generate an INSTRUCTION.md file with a
comprehensive setup guide covering authentication, input structure, and
command details.
What the Pipeline Does#
ge-eval eval orchestrates 6 steps in sequence:
| Step | Action | Description |
|---|---|---|
| 1 | Validate Inputs | Checks that golden dataset, CSV, and HTML folder exist; validates question alignment |
| 2 | Generate Summaries | Extracts agent trajectories from dolphin debug HTML files → outputs/summaries/ |
| 3 | Run LLM Judge | Calls Gemini to evaluate all questions |
| 4 | Enrich Golden Source | Adds expected citations from golden dataset to the CSV |
| 5 | Normalize Columns | Reorders CSV to final 15-column schema |
| 6 | Generate Report | Creates outputs/G_REPORT.md with stats and detailed RCA |
CLI Commands#
| Command | Description |
|---|---|
ge-eval init |
Scaffold a working directory with sample inputs, .env, and INSTRUCTION.md |
ge-eval check |
Validate config, env vars, and input file alignment |
ge-eval run |
Batch-query the GE streamAssist API |
ge-eval eval |
Run the full LLM-judge evaluation pipeline |
ge-eval serve |
Start a local HTTP server for the evaluation viewer |
ge-eval clean |
Remove all files from inputs/, outputs/, and INSTRUCTION.md |
Configuration (.env)#
The .env file must be placed in the project root (same directory where
you run ge-eval). It contains connection settings for the GE streamAssist API.
| Variable | Required | Description |
|---|---|---|
PROJECT_ID |
✅ Yes | Your Google Cloud project ID (e.g. my-project-418507) |
PROJECT_NUMBER |
Optional | The numeric project number |
APP_ID |
✅ Yes | The Agentspace / Discovery Engine app ID |
LOCATION |
Optional | API location (default: global) |
MODEL_ID |
Optional | Gemini model for LLM-judge evaluation (default: gemini-2.5-pro) |
GDRIVE_DATASTORE_ID |
Optional | Google Drive datastore ID (if using GDrive connector) |
CONFLUENCE_DATASTORE_ID |
Optional | Confluence datastore ID (if using Confluence connector) |
SHAREPOINT_COLLECTION_ID |
Optional | SharePoint collection ID (if using SharePoint connector) |
Example .env file:
PROJECT_ID="hello-world-418507"
PROJECT_NUMBER="671247654914"
APP_ID="agentspace-dev_1744685873939"
LOCATION="global"
MODEL_ID="gemini-2.5-pro"
SHAREPOINT_COLLECTION_ID="ge-sharepoint-connector_1774024549320"
Tip
Run ge-eval check to verify all required variables are set correctly.
Input Structure (inputs/ directory)#
All input files go in the inputs/ folder. The CLI auto-detects files by
extension, so you can use any filename — just keep them in inputs/.
inputs/
├── sample_results.csv # Questions CSV (input for ge-eval run)
├── golden_dataset.json # Golden dataset with expected answers
│ (or golden_dataset.yaml) # (YAML or JSON format supported)
└── fed_token/ # (Optional) Dolphin debug HTML folder
├── id_1_dolphin_debug.html
├── id_2_dolphin_debug.html
└── ...
Questions CSV (*.csv)#
The input CSV must have at minimum these columns:
| Column | Description |
|---|---|
id |
Unique question identifier (must match file_number in golden dataset) |
question |
The question text to send to the API |
question_type |
Category of question (e.g. "Simple fact extraction", "Summarization") |
Golden Dataset (*.yaml, *.yml, or *.json)#
The golden dataset defines expected answers, evaluation criteria, and expected source citations for each question.
Required structure per test case:
{
"total_count": 2,
"test_cases": [
{
"file_number": "1",
"category": "page",
"conversation": {
"turns": [
{
"turn": 1,
"question": "What is the processing time?",
"expected_response_contains": [
"5 business days"
],
"evaluation_criteria": {
"fact_extraction_accuracy": {
"description": "Must state 5 business days",
"threshold": 0.9,
"required_elements": ["5 business days"]
}
}
}
]
},
"expected_citations": [
"access-policy.aspx",
"https://sharepoint.com/..."
]
}
]
}
Dolphin Debug Folder (optional)#
If you have dolphin debug HTML files from the GE connector, place them in
a subfolder inside inputs/. The CLI auto-detects folders containing
files named id_*_dolphin_debug.html or id_*_no_output.html.
Contributing#
Development Setup#
# Clone and install dev dependencies
git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync
# Run tests
uv run pytest tests/ -v
# Build documentation locally
uv run mkdocs serve
Running Tests#
# Run all tests
uv run pytest tests/ -v
# Run specific test module
uv run pytest tests/test_ge_eval_cli.py -v
# Run with coverage
uv run pytest tests/ --cov=ge_eval -v
License#
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.