Quickstart#

CLI toolkit for evaluating Gemini Enterprise Connector RAG pipelines. Compares actual responses against a golden dataset, producing pass/fail grades, root cause analysis, and an interactive HTML dashboard.

Installation#

Using `uv` (recommended)#

Bash

uv pip install ge-eval

Using `pip`#

Bash

pip install ge-eval

From source#

Bash

git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync

Quick Start#

Bash

# Step 1: Scaffold a working directory with sample inputs
ge-eval init

# Step 2: Edit .env with your Google Cloud project settings

# Step 3: Validate configuration
ge-eval check

# Step 4: Query the API (sends questions, gets responses)
ge-eval run

# Step 5: Run LLM-judge evaluation
ge-eval eval

# Step 6: View results in the browser
ge-eval serve
# Then open http://localhost:8080

Tip

Run ge-eval init to generate an INSTRUCTION.md file with a comprehensive setup guide covering authentication, input structure, and command details.

What the Pipeline Does#

ge-eval eval orchestrates 6 steps in sequence:

Step	Action	Description
1	Validate Inputs	Checks that golden dataset, CSV, and HTML folder exist; validates question alignment
2	Generate Summaries	Extracts agent trajectories from dolphin debug HTML files → `outputs/summaries/`
3	Run LLM Judge	Calls Gemini to evaluate all questions
4	Enrich Golden Source	Adds expected citations from golden dataset to the CSV
5	Normalize Columns	Reorders CSV to final 15-column schema
6	Generate Report	Creates `outputs/G_REPORT.md` with stats and detailed RCA

CLI Commands#

Command	Description
`ge-eval init`	Scaffold a working directory with sample inputs, `.env`, and `INSTRUCTION.md`
`ge-eval check`	Validate config, env vars, and input file alignment
`ge-eval run`	Batch-query the GE streamAssist API
`ge-eval eval`	Run the full LLM-judge evaluation pipeline
`ge-eval serve`	Start a local HTTP server for the evaluation viewer
`ge-eval clean`	Remove all files from `inputs/`, `outputs/`, and `INSTRUCTION.md`

Configuration (.env)#

The .env file must be placed in the project root (same directory where you run ge-eval). It contains connection settings for the GE streamAssist API.

Variable	Required	Description
`PROJECT_ID`	✅ Yes	Your Google Cloud project ID (e.g. `my-project-418507`)
`PROJECT_NUMBER`	Optional	The numeric project number
`APP_ID`	✅ Yes	The Agentspace / Discovery Engine app ID
`LOCATION`	Optional	API location (default: `global`)
`MODEL_ID`	Optional	Gemini model for LLM-judge evaluation (default: `gemini-2.5-pro`)
`GDRIVE_DATASTORE_ID`	Optional	Google Drive datastore ID (if using GDrive connector)
`CONFLUENCE_DATASTORE_ID`	Optional	Confluence datastore ID (if using Confluence connector)
`SHAREPOINT_COLLECTION_ID`	Optional	SharePoint collection ID (if using SharePoint connector)

Example .env file:

Text Only

PROJECT_ID="hello-world-418507"
PROJECT_NUMBER="671247654914"
APP_ID="agentspace-dev_1744685873939"
LOCATION="global"
MODEL_ID="gemini-2.5-pro"
SHAREPOINT_COLLECTION_ID="ge-sharepoint-connector_1774024549320"

Tip

Run ge-eval check to verify all required variables are set correctly.

Input Structure (`inputs/` directory)#

All input files go in the inputs/ folder. The CLI auto-detects files by extension, so you can use any filename — just keep them in inputs/.

Text Only

inputs/
├── sample_results.csv          # Questions CSV (input for ge-eval run)
├── golden_dataset.json         # Golden dataset with expected answers
│   (or golden_dataset.yaml)    #   (YAML or JSON format supported)
└── fed_token/                  # (Optional) Dolphin debug HTML folder
    ├── id_1_dolphin_debug.html
    ├── id_2_dolphin_debug.html
    └── ...

Questions CSV (`*.csv`)#

The input CSV must have at minimum these columns:

Column	Description
`id`	Unique question identifier (must match `file_number` in golden dataset)
`question`	The question text to send to the API
`question_type`	Category of question (e.g. "Simple fact extraction", "Summarization")

Golden Dataset (`.yaml`, `.yml`, or `*.json`)#

The golden dataset defines expected answers, evaluation criteria, and expected source citations for each question.

Required structure per test case:

JSON

{
  "total_count": 2,
  "test_cases": [
    {
      "file_number": "1",
      "category": "page",
      "conversation": {
        "turns": [
          {
            "turn": 1,
            "question": "What is the processing time?",
            "expected_response_contains": [
              "5 business days"
            ],
            "evaluation_criteria": {
              "fact_extraction_accuracy": {
                "description": "Must state 5 business days",
                "threshold": 0.9,
                "required_elements": ["5 business days"]
              }
            }
          }
        ]
      },
      "expected_citations": [
        "access-policy.aspx",
        "https://sharepoint.com/..."
      ]
    }
  ]
}

Dolphin Debug Folder (optional)#

If you have dolphin debug HTML files from the GE connector, place them in a subfolder inside inputs/. The CLI auto-detects folders containing files named id_*_dolphin_debug.html or id_*_no_output.html.

Contributing#

Development Setup#

Bash

# Clone and install dev dependencies
git clone https://github.com/yapweiyih/gemini-enterprise-connector-eval.git
cd gemini-enterprise-connector-eval
uv sync

# Run tests
uv run pytest tests/ -v

# Build documentation locally
uv run mkdocs serve

Running Tests#

Bash

# Run all tests
uv run pytest tests/ -v

# Run specific test module
uv run pytest tests/test_ge_eval_cli.py -v

# Run with coverage
uv run pytest tests/ --cov=ge_eval -v

License#

This project is licensed under the Apache License 2.0 — see the LICENSE file for details.

Quickstart#

Installation#

Using uv (recommended)#

Using pip#

From source#

Quick Start#

What the Pipeline Does#

CLI Commands#

Configuration (.env)#

Input Structure (inputs/ directory)#

Questions CSV (*.csv)#

Golden Dataset (*.yaml, *.yml, or *.json)#

Dolphin Debug Folder (optional)#

Contributing#

Development Setup#

Running Tests#

License#

Using `uv` (recommended)#

Using `pip`#

Input Structure (`inputs/` directory)#

Questions CSV (`*.csv`)#

Golden Dataset (`.yaml`, `.yml`, or `*.json`)#