mirror of
https://github.com/foss42/apidash.git
synced 2025-12-03 19:39:25 +08:00
Update application_nideesh_bharath_kumar_ai_api_evaluator.md to support images
This commit is contained in:
@@ -119,7 +119,7 @@ This project is to develop a Dart-centered evaluation framework designed to simp
|
|||||||
|
|
||||||
**Architecture:**
|
**Architecture:**
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
- Frontend Layer: This layer will be the main API Dash app UI. It will use Flutter/Dart to build a UI for users to select the AI evaluation test specifications and obtain details such as API key, model name, API link, and other details. This layer will also display the real-time charts of the evaluations and final metrics.
|
- Frontend Layer: This layer will be the main API Dash app UI. It will use Flutter/Dart to build a UI for users to select the AI evaluation test specifications and obtain details such as API key, model name, API link, and other details. This layer will also display the real-time charts of the evaluations and final metrics.
|
||||||
|
|
||||||
@@ -142,23 +142,23 @@ This prototype contains a custom UI implementation of the AI evaluation layer, l
|
|||||||
|
|
||||||
The top right corner has a new button for API evaluations as show in the picture below:
|
The top right corner has a new button for API evaluations as show in the picture below:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
When selected, it prompts a selection of tests:
|
When selected, it prompts a selection of tests:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Hellaswag is the only implemented test currently. When selected, it prompts a menu with model name, API URL, API key, and limit of dataset rows being tested. I recommend setting the limit to 20 to reduce API usage.
|
Hellaswag is the only implemented test currently. When selected, it prompts a menu with model name, API URL, API key, and limit of dataset rows being tested. I recommend setting the limit to 20 to reduce API usage.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
When run is selected, it prompts a loading screen as the lm-evaluation-harness processes this request through a custom implementation of the provided models.
|
When run is selected, it prompts a loading screen as the lm-evaluation-harness processes this request through a custom implementation of the provided models.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
After the evaluation is finished, it provides a quick value for the accuracy. This is a simple prototype and a limit of rows on the test is set; so, this metric should be taken with a grain of salt.
|
After the evaluation is finished, it provides a quick value for the accuracy. This is a simple prototype and a limit of rows on the test is set; so, this metric should be taken with a grain of salt.
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Key changes are:
|
Key changes are:
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user