LLM-Based Code Evaluator

GenAI

NLP

Author

Suraj Jaiswal

Published

March 1, 2025

Built an automated code evaluation system using LLMs to assess code quality, correctness, and adherence to requirements — replacing slow manual review workflows.

Key Highlights:

3× throughput improvement over manual evaluation baseline
90% reduction in manual review time
Multi-threaded Streamlit UI enabling parallel evaluation of multiple submissions

Technical Approach:

OpenAI API for LLM-based code assessment with structured prompt engineering
Multi-threaded execution for parallel evaluation runs
Configurable scoring rubrics per evaluation criteria (correctness, style, efficiency)
Streamlit frontend for reviewers to inspect and override evaluations

Tech Stack: Python OpenAI API Streamlit Prompt Engineering Multi-threading

Impact: Scaled code review capacity by 3× while cutting reviewer time by 90%, enabling faster hiring and assessment pipelines.