10 Problem Overview

10.1 Data Cleaning Pipeline

There are two sources that problems draws from, the xbundle.xml found on Google Cloud Storage and Google BigQuery. Two SQL scripts are used to generate the data for the problems dashboard. The first, focused on multiple choice problems, retrieves data the problem_analysis, course_problem and and person_course tables. The second, focused on Open Response Assessments (ORAs), retrieves submission data from the event logs. Xbundle is used to look up questions and response strings from their IDs.

The useful data in the Xbundle is extracted into JSON files by xml_extraction.py. The assessment data involves some regular expressions and is too slow to calculate oat run-time, so it is further processed by wrangle_assessments.R.

10.2 The Structure of Problems and Assessments

Currently, the problem dashboard only supports two types of problems: choicegroup and checkboxgroup. Both of these problems are multiple choice but checkboxgroup has several correct answers that must all be selected for full credit. Open Response Assessments can be evaluated by self, peer, or TA/instructor and can have an arbitrary number of items in it’s rubric. As an example, a programming assessment my assess the student on passing edge cases and code style.

Assessments with only one item their rubric and multiple choice problems that have only correct responses are filtered out of the dashboard. Typically, these signal a survey question or ungraded self-reflection response.

10.3 Visualization Reasoning and Caveats

10.3.1 Module Overview

The goal of the module overview is to provide the instructor with data on the difficulty curve of the entire course. All questions in each module are scored out of 100 and averaged. Note that the relative weights of the problems are not included. The modules appear in the same order as in the XML. The XML also dictates the name of the modules. Sometimes malformed XML can lead to sub-optimal module names.

10.3.2 Hardest Problems

The three hardest problems intend to show the hardest problems (ordered hardest to least hard) and the response of each student. The aim of this plot is show not only show which problems students are struggling on, but also why they might be struggling. This plot works especially well with the module filter, since it allows the instructor to drill-down to the toughest problems for each module.

10.3.3 Assessments

The assessments plot is a new feature to the EdX dashboard. It allows the instructor to see not only how students did on ORAs but also how they did in different subcategories. Note that unlike the multiple choice questions, 100% grades are allowed for assessments.