initial idea_harsh_panchal_AI_API_EVAL.md

This commit is contained in:
HarshPanchal0910
2025-03-20 22:45:20 +05:30
committed by GitHub
parent b5d4922045
commit 7c2427131b

View File

@ -0,0 +1,53 @@
# Initial Idea Submission
**Full Name:** Harsh Panchal
**Email:** [harsh.panchal.0910@gmail.com](mailto:harsh.panchal.0910@gmail.com)
● [**GitHub**](https://github.com/GANGSTER0910)
● [**Website**](https://harshpanchal0910.netlify.app/)
● [**LinkedIn**](https://www.linkedin.com/in/harsh-panchal-902636255)
**University Name:** Ahmedabad University, Ahmedabad
**Program:** BTech in Computer Science and Engineering
**Year:** Junior, 3rd Year
**Expected Graduation Date:** May 2026
**Location:** Gujarat, India.
**Timezone:** Kolkata, INDIA, UTC+5:30
## **Project Title: AI API Eval Framework**
## **Relevant Issues: [#618](https://github.com/foss42/apidash/issues/618)**
*<Add links to the relevant issues>*
## **Idea Description**
The goal of this project is to create an AI API Evaluation Framework that provides an end-to-end solution to compare AI models on different kinds of data, i.e., text, images, and videos. The overall strategy is to use benchmark models to compare AI outputs with benchmark predictions. Metrics like BLEU, ROUGE, FID, and SSIM can also be utilized by the users to perform an objective performance evaluation of models.
For the best user experience in both offline and online mode, the platform will provide an adaptive assessment framework where users can specify their own assessment criteria for flexibility in dealing with various use cases. Their will be a feature of modern version control which will enable users to compare various versions of model and moniter performance over time. For the offline mode the evalutions will be supported using LoRA models which reduces resource consumption and will give outputs without compromising with accuracy. The system will use Explainability Integration with SHAP and LIME to demonstrate how things influence model decisions.
The visualization dashboard, built using Flutter, will include real-time charts, error analysis, and result summarization, making it easy to analyze the model performance. Offline with cached models or online with API endpoints, the framework will offer end-to-end testing.
With its rank-based framework, model explainability, and evaluatable configuration, this effort will be a powerful resource for researchers, developers, and organizations to make data-driven decisions on AI model selection and deployment.
## Unique Features
1) Benchmark-Based Ranking:
Compare and rank model results against pre-trained benchmark models.
Determine how well outputs resemble perfect predictions.
2) Advanced Evaluation Metrics:
Facilitate metrics such as BLEU, ROUGE, FID, SSIM, and PSNR for extensive analysis.
Allow users to define custom metrics.
3) Model Version Control:
Compare various versions of AI models.
Monitor improvement in performance over time with side-by-side comparison.
4) Explainability Integration:
Employ SHAP and LIME to explain model decisions.
Provide clear explanations of why some outputs rank higher.
5) Custom Evaluation Criteria:
Allow users to input custom evaluation criteria for domain-specific tasks.
6) Offline Mode with LoRA Models:
Storage and execution efficiency with low-rank adaptation models.
Conduct offline evaluations with minimal hardware demands.
7) Real-Time Visualization:
Visualize evaluation results using interactive charts via Flutter.
Monitor performance trends and detect weak spots visually.