Abstract
Professional developers, and especially students learning to program, often write poor documentation.
While automated assessment for programming is becoming more common in educational settings, often using unit tests for code functionality and static analysis for code quality, documentation assessment is typically limited to detecting the presence and the correct formatting of a docstring based on a specified style guide.
We aim to investigate how machine learning can be utilised to aid in automating the assessment of documentation quality.
We classify a large set of publicly available human-annotated relevance scores between a natural language string and a code string, using traditional approaches, such as Logistic Regression and Random Forest, fine-tuned large language models, such as BERT, and Low-Rank Adaptation of large language models.
Our most accurate model was a $k$-nearest-neighbours model with an accuracy of 58%.
While automated assessment for programming is becoming more common in educational settings, often using unit tests for code functionality and static analysis for code quality, documentation assessment is typically limited to detecting the presence and the correct formatting of a docstring based on a specified style guide.
We aim to investigate how machine learning can be utilised to aid in automating the assessment of documentation quality.
We classify a large set of publicly available human-annotated relevance scores between a natural language string and a code string, using traditional approaches, such as Logistic Regression and Random Forest, fine-tuned large language models, such as BERT, and Low-Rank Adaptation of large language models.
Our most accurate model was a $k$-nearest-neighbours model with an accuracy of 58%.
Original language | English |
---|---|
Title of host publication | Artificial Intelligence in Education. AIED 2024 |
Subtitle of host publication | Lecture Notes in Computer Science |
Publisher | Springer |
Volume | 14829 |
ISBN (Electronic) | 978-3-031-64302-6 |
ISBN (Print) | 978-3-031-64301-9 |
Publication status | Published - 8 Jul 2024 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer |
Volume | 14829 |
Keywords
- Automated Grading
- Assessment
- Computer Science Education
- Large Language Models
- Documentation
- Programming Education