LLM-Based Code Evaluator
GenAI
NLP
Built an automated code evaluation system using LLMs to assess code quality, correctness, and adherence to requirements — replacing slow manual review workflows.
Key Highlights:
- 3× throughput improvement over manual evaluation baseline
- 90% reduction in manual review time
- Multi-threaded Streamlit UI enabling parallel evaluation of multiple submissions
Technical Approach:
- OpenAI API for LLM-based code assessment with structured prompt engineering
- Multi-threaded execution for parallel evaluation runs
- Configurable scoring rubrics per evaluation criteria (correctness, style, efficiency)
- Streamlit frontend for reviewers to inspect and override evaluations
Tech Stack: Python OpenAI API Streamlit Prompt Engineering Multi-threading
Impact: Scaled code review capacity by 3× while cutting reviewer time by 90%, enabling faster hiring and assessment pipelines.