ORA Feature Summary

The Open Response Assessment feature is a component that supports long text-based submissions and file uploads. Once submitted, learner submissions are graded against a rubric by either staff, the submitting learner themselves, or other learners in the course. In short, this feature is designed to support what is typically found as three distinct problem types:

Staff-graded assessment
Peer-graded assessment
Self assessment

Submissions can be text-based or by file upload. Text-based submissions can be edited in plain text, or with a WYSIWYG editor, which can be enabled by staff when configuring the assessment. File submissions have an extension blacklist that can be configured at the platform level to help prevent learners from sharing malicious and damaging files. File uploads are limited to 500MB.

Submissions always respond to a prompt given by staff, and an ORA can feature multiple prompts. Prompts always share input settings, so if one prompt expects a text response using the WYSIWYG editor, all prompts will expect a text response using the WYSIWYG editor.

Grading is always tied to the rubric, enforcing grading best practices, with comments on each stage and the overall assignment being optionally configurable. There is also an optional “training” step, where sample submissions entered by staff can be graded, preventing the learner from progressing until they select the same graded options as the staff member. No points are awarded for participating in training.

An ORA can go through multiple steps of this process, meaning that if all steps were enabled, a learner can be trained to grade, participate in peer assessment, assess themselves, and then have their submission graded by staff. If this is set up in this way, the staff-graded step overrides all other grades, making self-assessment and peer assessment formative in nature, rather than actually granting a grade. The order of precedence is essentially:

Self Assessment < Peer Assessment < Most Recent Staff Assessment

As if an assignment is graded by more than one member of staff, only the most recent grade matters, and any grade will override a self-assessment grade.

Grading is strictly tied to a specific rubric, which consists of Criteria, consisting of a name and a prompt asking a question about the response, and Options, which are predefined answers for the prompts that . For example, a Criteria could be “How good does the cake taste?” with options of “Awful, Poor, Good, Amazing”. Each option is assigned a number of points for calculating the overall grade.

A learner’s grade is calculated by the median of peer grades received on each criterion, so if a learner receives a 7, an 8, and a 10, they will receive an 8. If a learner receives a 10, a 4 and a 3, they will receive a 4. A learner’s final score in a peer assessment is the total of all median scores for each criterion.

With peer assessments, staff can configure a number of assignments that each learner must grade in order to receive feedback on their own. This is designed to help enforce learners actually participating in the peer assessment activity, rather than simply leaving their assignment and moving on. edX.org recommends setting “Must Grade to 4”, and “Graded By” to 3, to ensure there are more graders than assignments waiting to be graded, though lower numbers are typically needed at smaller scales.

An additional optional feature for peer grading is what is known as “Flexible Peer Grade Averaging”. This simply waits 7 days and then reduces the required grade (the “Graded By” number) to 30% of the previous value for that assignment, rounded down. So an assignment configured as recommended by edX.org will decrease from 3 reviews to 1, and take that grade. Every 7 days this process repeats, ensuring that all assignments will receive a grade if a single peer has assessed their work after 14 days, so long as the original “Graded By” number was lower than 24 as the value rounds down each time (after 7 days the requirement will decrease to 6, and then after 7 more days, the number will decrease to 1, whereas a value of 24 would go to 7, then 2).

There is an optional scoreboard that can display the current top-rated responses after a learner has submitted their response, displaying the best-scoring answers provided by other learners.

Finally, staff have a small range of tools for managing a running ORA within the LMS - to grade, delete, and override grades, as well as manage individual learners directly.

Pedagogical Assessment

Is there research to support the need of this feature in delivering pedagogical value and impact?

Staff-grading is a table-stakes feature that must exist in any LMS, and the concept of conducting thorough, academic evaluation of long-form work without the ability for a human to grade it is a little silly, so I won’t be including much pedagogic research into the value of this part of this feature.

What’s more interesting are the need for peer assessment and self-assessment, so that’s where I’ll focus this attention.

Peer assessment is a widespread and popular tool that is broadly used, but requires a coexisting group of learners, which is not always possible on self-paced courses. Research has found that “a peer assessor with less skill at assessment but more time in which to do it can produce an assessment of equal reliability and validity to that of a teacher” (source). Despite this, at present it is almost impossible to make claims about what exactly constitutes effective peer assessment that delivers actual meaningful results, which makes it harder to determine what makes the ideal peer assessment tool, or what form peer assessment should actually take. The value of being able to receive feedback is indisputable, and the value of learning the skill of providing good feedback and proving that the learner has the knowledge required to provide that feedback is widely acknowledged, but whether peer assessment is the best way to do that, and the best form that process can take is not something that is backed by extensive research.

Many learners themselves believe that peer assessment is valuable and supports their learning process, and the practice itself has been shown to have a positive effect on learning outcomes and attitudes, so the question should logically become “how can tools be built that best support quality peer assessment, and assist non-academic course instructors to ensure their peer assessments have pedagogic value?”. Unfortunately this isn’t a question that’s currently supported by much definitive research that I can access, as it’s fairly specific to the online, non-core academic use case.

What’s interesting is that there’s generally understood to be a need for learners to learn to conduct effective peer assessment in order for it to have value, particularly as a summative assessment rather than simply being a formative learning experience. The more freeform the assessment, the more training is necessary. Logically this would imply that a more strictly guided approach would result in more reliable grading results, but there’s also an argument that the more freeform and flexible peer grading is, the more value it has as a learning activity for the graders, in much the same way that a graded free-form assignment can be more beneficial for a learner’s growth and assessment than multiple choice questions. This is a fine line to walk.

Self-assessment is much the same way in that it has far more value if the learners understand the reasoning, value, and methodology behind self-assessment before conducting self-assessment activities. This means it’s almost never used as a teaching technique outside of academia, despite the potential benefits, which are well-researched. This value is unfortunately not really seen by non-academic course authors, or by many learners themselves. If a self-assessment is not summative (i.e. it does not count towards course completion), then learners are liable to skip it entirely, while if it does matter towards their final grade, even a small amount, learners will regularly over-inflate their own scores in order to increase their odds of passing. This doesn’t make the activity of self-assessing less valuable, but it does make the value of the resulting grade more questionable. When the activity is completely formative, completion rates are typically lower, but grades are typically much more in-line with external assessors.

The more a learner understands the purpose of the assignment, and has clear grading guidelines, the less likely they are to overestimate their grades.

Links:

Subject-Matter Alignment

What types of courses/subjects does this feature support?

Staff-graded assignments are only feasibly usable in the private course use-case, where a member of staff can be assigned to a small group of users of a preset size. These courses are typically instructor-led, as it is difficult to fit grading of an entirely self-paced course into the schedule of a grader.

Peer-graded assignments are a little more flexible, allowing for a far higher volume of learners to be graded by their peers. But this in turn increases the need for content moderation, and is still a time-bound activity. Many platforms, including Open edX, have features that effectively let a learner virtually ignore the majority of the peer grading requirements if nobody has graded their assignment after a certain period of time. This means the value of the exercise is undermined significantly for practical reasons. So it still functions best on a course that has a coordinated group of learners starting and ending on preset dates.

Self-assessment is, as previously mentioned, only typically useful where learners are adequately taught its value, and where the course authors also understand how to use self-assessment. This means it’s seldom used outside of academic courses.

All three tools are useful in almost every type of course, with no huge bias towards any particular subject. Conceivably, courses where learners are going to be able to produce more easily assessable artefacts such as pieces of writing, code, or media assets are likely to benefit the most. Most importantly, given the inherent expertise required to conduct rubrics, perform staff-grading, and teach learners to peer grade effectively, most activities supported by ORAs are typically unsuitable for non-academic courses such as customer product education, as well as short, self-paced, low-stakes courses.