Prompt Results Evaluation System
Last updated
Last updated
At Mirai, we've developed a sophisticated system to provide the best possible answers to user requests. Our system dynamically selects the most appropriate Large Language Model (LLM), prompt, execution parameters, or even static responses based on the current context, including conversation flow, topic, and user data.
Our prompt results evaluation system consists of several interconnected components:
Function: Executes prompts
Role: Core component that interacts with LLMs to generate responses
Function: Stores prompts and execution statistics
Role: Maintains a database of prompts and their performance metrics
Function: Provides an abstraction layer for data representation
Key Features:
Creates nodes from user inputs, prompt executions, and user information
Assigns traits to nodes (e.g., topic, tone of voice)
Establishes relationships between nodes
Allows grouping of nodes based on various criteria (e.g., same chat, user, or topic)
Enables automatic prompt improvement through group metrics
Function: Calculates traits for nodes
Role: Analyzes content to determine characteristics like topic and sentiment
Function: Analyzes the graph and assigns scores to nodes
Key Features:
Considers user context and current conversation state
Creates mappings of optimal LLMs and execution parameters for specific contexts
Facilitates selection of the best transition between conversation states
Function: Measures prompt performance
Role: Identifies when prompt results are suboptimal, signaling the need for modification
Function: Modifies or forks prompts to improve performance
Role: Automatically adjusts prompts based on performance data
Function: Admin application for prompt management
Key Features:
Allows users to create and deploy prompts
Provides usage statistics and execution traces
Function: Provides programmatic access to the Mirai system
Role: Enables integration with external applications and services
Function: Consume Mirai's APIs
Examples: Both Mirai's own applications and third-party integrations
User input is processed through the Graph Engine, creating nodes with specific traits.
The Rank Engine analyzes the graph to determine the best course of action.
The Prompt Engine executes the chosen prompt using the optimal LLM and parameters.
The Rewarding Engine evaluates the prompt's performance.
If necessary, the Evolution Engine modifies the prompt for future improvements.
As we continue to develop our system, we're exploring several ideas to enhance its capabilities:
Custom Performance Metrics: Allowing users to provide their own data about prompt performance (e.g., for e-commerce product descriptions).
Extensive Context Integration: Encouraging users to pass as much contextual data as possible, similar to analytics systems.
Template Tagging: Implementing a system for users to add tags to their prompt templates.
Prompt Marketplace: Developing a GitHub-like platform where users can publish and share their prompts.
On-Premise Version: Evaluating the need for a self-hosted version of our system.
Mirai's prompt results evaluation system represents a significant advancement in AI-powered conversation and task completion. By leveraging a complex network of interconnected components, we're able to provide highly contextual, optimized responses that continually improve over time.