Skip to content

Quickstart#

CLI toolkit for evaluating Gemini Enterprise Connector RAG pipelines. Compares actual responses against a golden dataset, producing pass/fail grades, root cause analysis, and an interactive HTML dashboard.


Installation#

Bash
uv pip install ge-eval

Using pip#

Bash
pip install ge-eval

From source#

Bash
git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync

Quick Start#

Bash
# Step 1: Scaffold a working directory with sample inputs
ge-eval init

# Step 2: Edit .env with your Google Cloud project settings

# Step 3: Validate configuration
ge-eval check

# Step 4: Query the API (sends questions, gets responses)
ge-eval run

# Step 5: Run LLM-judge evaluation
ge-eval eval

# Step 6: View results in the browser
ge-eval serve
# Then open http://localhost:8080

Tip

Run ge-eval init to generate an INSTRUCTION.md file with a comprehensive setup guide covering authentication, input structure, and command details.


What the Pipeline Does#

ge-eval eval orchestrates 6 steps in sequence:

Step Action Description
1 Validate Inputs Checks that golden dataset, CSV, and HTML folder exist; validates question alignment
2 Generate Summaries Extracts agent trajectories from dolphin debug HTML files → outputs/summaries/
3 Run LLM Judge Calls Gemini to evaluate all questions
4 Enrich Golden Source Adds expected citations from golden dataset to the CSV
5 Normalize Columns Reorders CSV to final 15-column schema
6 Generate Report Creates outputs/G_REPORT.md with stats and detailed RCA

CLI Commands#

Command Description
ge-eval init Scaffold a working directory with sample inputs, .env, and INSTRUCTION.md
ge-eval check Validate config, env vars, and input file alignment
ge-eval run Batch-query the GE streamAssist API
ge-eval eval Run the full LLM-judge evaluation pipeline
ge-eval serve Start a local HTTP server for the evaluation viewer
ge-eval clean Remove all files from inputs/, outputs/, and INSTRUCTION.md

Configuration (.env)#

The .env file must be placed in the project root (same directory where you run ge-eval). It contains connection settings for the GE streamAssist API.

Variable Required Description
PROJECT_ID ✅ Yes Your Google Cloud project ID (e.g. my-project-418507)
PROJECT_NUMBER Optional The numeric project number
APP_ID ✅ Yes The Agentspace / Discovery Engine app ID
LOCATION Optional API location (default: global)
MODEL_ID Optional Gemini model for LLM-judge evaluation (default: gemini-2.5-pro)
GDRIVE_DATASTORE_ID Optional Google Drive datastore ID (if using GDrive connector)
CONFLUENCE_DATASTORE_ID Optional Confluence datastore ID (if using Confluence connector)
SHAREPOINT_COLLECTION_ID Optional SharePoint collection ID (if using SharePoint connector)

Example .env file:

Text Only
PROJECT_ID="hello-world-418507"
PROJECT_NUMBER="671247654914"
APP_ID="agentspace-dev_1744685873939"
LOCATION="global"
MODEL_ID="gemini-2.5-pro"
SHAREPOINT_COLLECTION_ID="ge-sharepoint-connector_1774024549320"

Tip

Run ge-eval check to verify all required variables are set correctly.


Input Structure (inputs/ directory)#

All input files go in the inputs/ folder. The CLI auto-detects files by extension, so you can use any filename — just keep them in inputs/.

Text Only
inputs/
├── sample_results.csv          # Questions CSV (input for ge-eval run)
├── golden_dataset.json         # Golden dataset with expected answers
│   (or golden_dataset.yaml)    #   (YAML or JSON format supported)
└── fed_token/                  # (Optional) Dolphin debug HTML folder
    ├── id_1_dolphin_debug.html
    ├── id_2_dolphin_debug.html
    └── ...

Questions CSV (*.csv)#

The input CSV must have at minimum these columns:

Column Description
id Unique question identifier (must match file_number in golden dataset)
question The question text to send to the API
question_type Category of question (e.g. "Simple fact extraction", "Summarization")

Golden Dataset (*.yaml, *.yml, or *.json)#

The golden dataset defines expected answers, evaluation criteria, and expected source citations for each question.

Required structure per test case:

JSON
{
  "total_count": 2,
  "test_cases": [
    {
      "file_number": "1",
      "category": "page",
      "conversation": {
        "turns": [
          {
            "turn": 1,
            "question": "What is the processing time?",
            "expected_response_contains": [
              "5 business days"
            ],
            "evaluation_criteria": {
              "fact_extraction_accuracy": {
                "description": "Must state 5 business days",
                "threshold": 0.9,
                "required_elements": ["5 business days"]
              }
            }
          }
        ]
      },
      "expected_citations": [
        "access-policy.aspx",
        "https://sharepoint.com/..."
      ]
    }
  ]
}

Dolphin Debug Folder (optional)#

If you have dolphin debug HTML files from the GE connector, place them in a subfolder inside inputs/. The CLI auto-detects folders containing files named id_*_dolphin_debug.html or id_*_no_output.html.


Contributing#

Development Setup#

Bash
# Clone and install dev dependencies
git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync

# Run tests
uv run pytest tests/ -v

# Build documentation locally
uv run mkdocs serve

Running Tests#

Bash
# Run all tests
uv run pytest tests/ -v

# Run specific test module
uv run pytest tests/test_ge_eval_cli.py -v

# Run with coverage
uv run pytest tests/ --cov=ge_eval -v

License#

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.