Abstract
The limited quality of technical documentation in existing software systems can often lead to difficulty in using the library to implement new features.
Professional developers, and especially students learning to program, often write poor documentation.
While automated assessment for programming is becoming more common in educational settings, often using unit tests for code functionality and static analysis for code quality, documentation assessment is typically limited to detecting the presence and the correct formatting of a docstring based on a specified style guide.
We aim to investigate how machine learning can be utilised to aid in automating the assessment of documentation quality.
We classify a large set of publicly available human-annotated relevance scores between a natural language string and a code string, using traditional approaches, such as Logistic Regression and Random Forest, fine-tuned large language models, such as BERT and GPT, and Low-Rank Adaptation of large language models.
Our most accurate mode was a fine-tuned CodeBERT model, resulting in a test accuracy of 89%.
Professional developers, and especially students learning to program, often write poor documentation.
While automated assessment for programming is becoming more common in educational settings, often using unit tests for code functionality and static analysis for code quality, documentation assessment is typically limited to detecting the presence and the correct formatting of a docstring based on a specified style guide.
We aim to investigate how machine learning can be utilised to aid in automating the assessment of documentation quality.
We classify a large set of publicly available human-annotated relevance scores between a natural language string and a code string, using traditional approaches, such as Logistic Regression and Random Forest, fine-tuned large language models, such as BERT and GPT, and Low-Rank Adaptation of large language models.
Our most accurate mode was a fine-tuned CodeBERT model, resulting in a test accuracy of 89%.
Original language | English |
---|---|
Title of host publication | 25th International Conference on Artificial Intelligence in Education |
Publisher | SpringerLink |
Publication status | Accepted/In press - 2024 |
Keywords
- Automated Grading
- Assessment
- Computer Science Education
- \keywords{Automated Grading \and Assessment \and Computer Science Education \and Machine Learning \and Large Language Models \and Documentation \and Programming Education}
- Large Language Models
- Documentation
- Programming Education