Useful CourseGraph Queries!
Help The Community! Whenever you come up with a new query, add it here. Even if you think the use case is obscure.
Table of Contents
- 1 Table of Contents
- 2 Orphans
- 3 XBlocks
- 3.1 What XBlocks do we have? (total count)
- 3.1.1 XBlock counts
- 3.2 What xBlocks do we have? (across all courses, with counts by type)
- 3.2.1 XBlock counts
- 3.3 Which courses have a specific XBlock field enabled?
- 3.3.1 XBlock field matches
- 3.4 Where are all the XBlock types used in my course? (with direct link to where they are used)
- 3.5 What courses contain deprecated advanced modules?
- 3.6 What courses contain specific advanced modules?
- 3.7 What courses are using known third-party discussion forums?
- 3.1 What XBlocks do we have? (total count)
- 4 Courses with creative-common licenses
- 5 Cohorts and differentiated content
- 6 Problems
- 7 Course Structure
- 7.1 Components that are direct children of subsections instead of units/verticals
- 7.2 Subsections that contain word clouds and peer instruction
- 7.3 Length of the longest Subsection title
- 7.4 Length of the longest Section title
- 7.5 How many pages contain nested XBlocks?
- 7.5.1 Nested XBlocks in Units
- 7.5.2 Units with Nested XBlocks
- 7.5.3 Searching path depth
- 7.6 Subsection settings from the Course Outline
- 7.7 Order of content on a page ([ :PRECEDES] relationship)
- 7.8 Length of the longest Section title
- 8 Videos
- 8.1 Where are all the video components in a course?
- 8.2 What are the video titles and Video IDs in a course?
- 8.3 Where are all in-use videos located in my course? (with direct link)
- 8.4 What courses have not enabled video upload?
- 8.5 Which courses allow video download?
- 8.6 Where are all my org's videos and what video source / URL are they being served from?
- 8.7 How many videos use start/end times (i.e. trim the video)?
- 8.8 How many creative-commons licensed videos are there?
- 9 ORA
- 10 Advanced settings
- 11 Exams
- 12 Proctored Exams
- 12.1 Which courses have practice proctored exams?
- 12.2 Which courses have official proctored exams?
- 12.3 Link and start date for the first proctored exam in a course
- 12.4 Which courses have currently available proctored exams?
- 12.5 What are the release and due dates of the proctored exams?
- 12.6 What is the proctoring backend in use by a course?
- 13 Course quality & integrity
- 14 Discussion forums
- 15 Course updates
- 16 LTI
- 17 Due dates
- 18 Content: HTML and Problems
- 18.1 Which courses contain a certain keyword anywhere in their course content?
- 18.2 What courses contain scripts with python code?
- 18.3 Where all the python-graded problems?
- 18.4 Which HTML modules have (illegal) self-closing <script> and <iframe> tags?
- 18.5 Which HTML modules use X iframe in them and where?
- 19 Age of Course Data
Orphans
All non-orphan problems in a course
All non-orphan problems in a course
MATCH
(course)-[:PARENT_OF*]->(p:problem)
WHERE
p.course_key = <course_key>
RETURN
p
All root orphans in a course
All root orphans in a course
match
(n:item)
where
NOT () -[:PARENT_OF]-> (n)
AND n.course_key ="<course_key>"
AND not n.detached
AND NOT n.block_type='course'
return
n.location
Number of Non-orphan Items in the modulestore
Number of non-orphan blocks
MATCH
(c:course) -[:PARENT_OF*]-> (n)
RETURN
count(distinct(c)) + count(distinct(n))
Note that this query isn't perfect. There are some "expected" orphans, like static tabs or about blocks. In a future version of neo4j, we'll include a "detached" parameter for these detached xblock classes.
Number of non-orphan blocks in each course
MATCH
(c:course)-[:PARENT_OF*]->(b)
RETURN
c.course_key as id,
count(c) as blocks
ORDER BY
blocks DESC
XBlocks
What XBlocks do we have? (total count)
XBlock counts
MATCH
(c:course) -[:PARENT_OF*]-> (n:item)
RETURN
distinct(n.block_type) as block_type,
count(n) as number
order by
number DESC
What xBlocks do we have? (across all courses, with counts by type)
This was used in 2016 and June 2020 for tracking / visualization.
XBlock counts
MATCH
(c:course) -[:PARENT_OF*]-> (n:item)
RETURN
distinct(n.block_type) as block_type,
count(distinct(c)) as coursecount,
count(n) as number
order by
number DESC
Which courses have a specific XBlock field enabled?
XBlock field matches
MATCH
(c:course)-[:PARENT_OF*]->(n:`block_name`)
WHERE
(n.field_name)
RETURN
c.org,
c.course_key
ORDER BY
c.org,
c.course_key
Where are all the XBlock types used in my course? (with direct link to where they are used)
MATCH
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
WHERE
c.course_key = 'courseID' // replace with the course ID
AND
n.block_type = 'problem' // insert desired block type here
RETURN
n.block_type,
n.display_name,
n.location as block_id,
v.display_name as unit_location,
replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL,
n.edited_on,
n.visible_to_staff_only
ORDER BY
n.edited_on;
What courses contain deprecated advanced modules?
MATCH
(c:course)
WHERE
ANY(module
IN
c.advanced_modules
WHERE
module = 'peergrading'
or module = 'rate'
or module = 'randomize'
or module = 'peergrading'
or module = 'mentoring'
or module = 'graphical_slider_tool'
or module = 'combinedopenended')
AND c.end <>"None"
RETURN
distinct c.course_key as course_key,
c.advanced_modules as modules,
c.end as course_end_date
order by
c.end desc
What courses contain specific advanced modules?
MATCH
(c:course)
WHERE
'officemix' IN c.advanced_modules
RETURN
distinct c.course_key as course_key,
c.end as course_end_date
order by
c.end desc
What courses are using known third-party discussion forums?
// NOTE: this query takes advantage of the fact that many (but not all) LTI components contain the resource name in the launch_url (e.g. Piazza LTIs contain 'piazza' in the URL
MATCH
(c:course)-[:PARENT_OF*]->(n:lti_consumer)
WHERE
n.launch_url CONTAINS 'piazza' OR n.launch_url CONTAINS 'inscribe' OR n.launch_url CONTAINS 'edstem' OR n.launch_url CONTAINS 'yellowdig' OR n.launch_url CONTAINS 'discourse'
RETURN
distinct(c.course_key) as courseruns,
c.org,
n.launch_url,
c.start,
c.end
Courses with creative-common licenses
Orgs with creative-commons courses, sorted
MATCH
(c:course)
WHERE
c.license CONTAINS "creative-commons"
RETURN
distinct(c.org) as org,
count(distinct(c.course)) as number_creative_common_courses
ORDER BY
number_creative_common_courses DESC
All creative-commons courses
MATCH
(c:course)
WHERE
c.license CONTAINS "creative-commons"
RETURN
distinct(c.course) as course,
c.display_name as title,
c.org as org
ORDER BY
course
Cohorts and differentiated content
Courses with cohorts
Courses with cohorts
match
(c:course)
where
size([x in c.user_partitions WHERE x contains "scheme=<class 'openedx.core.djangoapps.course_groups.partition_scheme.CohortPartitionScheme'>"]) >= 1
return
c.course_key
This query is a little complex, and only works because the string representation of a user partition contains a string ( scheme=<class 'openedx.core.djangoapps.course_groups.partition_scheme.CohortPartitionScheme'>
) that we can use to determine if a user partition is a cohort. The list comprehension,
[x in c.user_partitions WHERE x contains "scheme=<class 'openedx.core.djangoapps.course_groups.partition_scheme.CohortPartitionScheme'>"]
gives a filtered list of the cohorts in a course. We use the size
function to see if there's more than one of them in a course.
Which courses use group visibility settings (i.e. has differentiated content)?
Courses with cohorts
match
(c:course)-[:PARENT_OF*]->(v:vertical)-[:PARENT_OF*]->(i:item)
where
i.group_access <> '{}' // this attribute is {} when group visibility is not set
return
distinct(id(v)) as unit_id,
c.course_key as course_id,
c.display_name as course_title,
c.org as org,
v.display_name as unit_title,
("https://studio.edx.org/container/" + v.location) as url
Courses with graded differentiated content
match
(n:course) -[:PARENT_OF*]-> (c:item)
with
c.course_key as course_key,
count(c) - sum(case when c.group_access <> '{}'
and c.graded then 0 else 1 end) as diff_content
where
diff_content <> 0
return
course_key
Problems
What types of "problem" XBlocks do we have?
XBlock field matches
MATCH
(c:course) -[:PARENT_OF*]-> (n:problem)
RETURN
// n.data contains tags that indicate what type of problem is being used. This is likely a non-exhaustive list!
CASE
WHEN n.data CONTAINS "<choiceresponse"
THEN "checkbox"
WHEN n.data CONTAINS "<multiplechoiceresponse"
THEN "multiple choice"
WHEN n.data CONTAINS "<numericalresponse"
THEN "numerical input"
WHEN n.data CONTAINS "<stringresponse"
THEN "text input"
WHEN n.data CONTAINS "<optionresponse"
THEN "drop-down"
WHEN n.data CONTAINS "<imageresponse"
THEN "image response"
WHEN n.data CONTAINS "<annotationresponse"
THEN "annotation response"
WHEN n.data CONTAINS "<formularesponse"
THEN "formula response"
WHEN n.data CONTAINS "<coderesponse"
THEN "code response"
WHEN n.data CONTAINS "<customresponse"
THEN "custom"
ELSE "other"
END AS problem_type,
count(*) as number
All jsinput problems, counted by org
All jsinput problems, counted by org
MATCH
(course)-[:PARENT_OF*]->(p:problem)
WHERE
p.data
CONTAINS
'jsinput'
RETURN
distinct(p.org) as org,
count(distinct(p.course_key)) as numberOfCourses,
count(distinct(p)) as numberOfProblems
Courses with new drag and drop XBlocks
new drag and drop problems by course
MATCH
(c:course)-[:PARENT_OF*]->(n:`drag-and-drop-v2`)
RETURN
distinct(n.course_key),
count(distinct(n)),
c.start,
c.end
order by
c.start DESC
Courses with old drag and drop problems
old drag and drop problems by course
MATCH
(c:course)-[:PARENT_OF*]->(p:problem)
WHERE
p.data CONTAINS "drag_and_drop_input"
RETURN
distinct(p.course_key),
count(distinct(p)),
c.start,
c.end
order by
c.start desc
What courses use xqueue (code graders)?
We wanted to identify the open courses with code graders, and to find an example problem from each course so we could look up the queuename. Since code graders aren't a unique problem type, we looked for a phrase specific to code graders in the content of the problem.
match (c:course) -[:PARENT_OF*]->(n:problem) where n.data contains "coderesponse queuename" return c.org, c.course_key, count(n), collect(n.location)[..1] as example, c.start, c.end ORDER BY c.start DESC
What are the locations of code grader problems in a course? (Same as entry above but returns list of block + vertical locations that match
match (c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:problem)
where n.data contains "coderesponse queuename"
and c.course_key = 'courseID' // replace with the course ID
return v.location, n.location
Where are a handful of modules that use MathJax?
MATCH
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
WHERE
n.block_type = 'problem'// MathJax may also exist in 'html' block types
AND
n.data CONTAINS '[mathjax]' // Some MathJax equations may also be found using '[mathjax inline]'
AND
n.visible_to_staff_only = false
AND
c.start <= '2021-06-29' and c.end > '2021-06-29' // insert today's date for live courses
RETURN
c.course_key,
n.block_type,
n.display_name,
n.location as block_id,
v.display_name as unit_location,
replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL
LIMIT 20;
Course Structure
Components that are direct children of subsections instead of units/verticals
Useful to find Components on old XML-authored courses that are children of the Subsection instead of the Unit (based on the Studio-enforced Course->Section->Subsection->Unit->Component
hierarchy).
Assumes that the course(s) follows Course->Section->Subsection
, though.
You can omit the c.course_key = "<course key>" and
line if you want to query across all courses.
match (c:course) -[:PARENT_OF]-> (ch:chapter) -[:PARENT_OF]-> (s:sequential) -[:PARENT_OF]-> (cmp)
where
c.course_key = "<course key>" and
cmp.block_type <> "vertical" and
cmp.block_type <> "unit"
return c.course_key as course_id, s.location as subsection_id, cmp.location as component_id
Subsections that contain word clouds and peer instruction
This query might be useful in when setting a bug priority - in this particular scenario, the two components would conflict when in the same subsection, but a query showed they were rarely used together.
match (w:word_cloud)<-[:PARENT_OF*]-(s:sequential) -[:PARENT_OF*]-> (p:`ubcpi`)
Return s.course_key,s.display_name, p.display_name, w.display_name
Length of the longest Subsection title
MATCH
(n:sequential)
RETURN
max(length(n.display_name))
Length of the longest Section title
Length of the longest Section title
MATCH
(n:chapter)
RETURN
max(length(n.display_name))
How many pages contain nested XBlocks?
Find and count the XBlocks (p) that contain other XBlocks ( n), that appear in units (v). For all these searches, you'll get more real results by adding constraints like "vertical in a recent course" or "is published to learners"
Nested XBlocks in Units
MATCH
(v:vertical) -[:PARENT_OF]-> (p) -[:PARENT_OF*]-> (n)
RETURN
distinct(p.block_type), count(p)
Note for the below that we're not eliminating cyclical relationships; no guarantee that n is a leaf node.
Units with Nested XBlocks
//Count all units that contain nested xblocks i.e. blocks that contain other blocks
MATCH (v:vertical) -[:PARENT_OF]-> (p) -[:PARENT_OF*]-> (n) return count(distinct(v))
Useful toolkit for searching path depth:
Searching path depth
//Find examples of problems that are nested 3 or more steps in a unit.
MATCH
(v:vertical) -[:PARENT_OF*3..]-> (p:problem)
WHERE NOT (p)-[:PARENT_OF]->()
return v.location, v.type, p.location
limit 10
//Other phrases
// (v:vertical) -[:PARENT_OF*3]-> (p:problem) = problem is exactly three steps from vertical
// (v:vertical) -[:PARENT_OF*..3]-> (p:problem) = problem is 3 or fewer steps from vertical
Subsection settings from the Course Outline
These queries relate to settings that can be configured from the Course Outline via the gear icon next to a subsection.
What courses have used the "hide after due" feature?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.hide_after_due
RETURN
distinct(n.course_key),
count(n)
How many active or upcoming courses use the "Limit Access" feature?
This refers to subsection prerequisites under the advanced tab.
MATCH
(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE
c.enable_subsection_gating
AND
s.location CONTAINS "sequential+block"
AND
c.end > "2021-01-01"
RETURN
count(distinct(c.course_key))
How many gated subsections are there in active or upcoming courses using the “Limit Access” feature?
MATCH
(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE
c.enable_subsection_gating
AND
s.location CONTAINS "sequential+block"
AND
c.end > "2021-01-01"
RETURN
distinct(c.course_key),
count(c)
ORDER BY
c.course_key asc
Order of content on a page ([ :PRECEDES] relationship)
Particular layouts of a page might be of interest - for instance: What courses interleave exercises and text on a page? Does any page start with a discussion prompt before offering more guidance on how to participate? -[:PRECEDES]->
will help search for a specific order of children.
Search for pages that contain graded problem-text-problem on a page.
match (c:course)-[:PARENT_OF*]->(v:vertical)-[:PARENT_OF*]->(a:problem)-[:PRECEDES]->(h:html)-[:PRECEDES]->(b:problem)
where a.graded = True
return c.course_key, v.location
Length of the longest Section title
MATCH
(n:chapter)
RETURN
max(length(n.display_name))
Videos
Where are all the video components in a course?
match
(c:course)-[:PARENT_OF]->(h:chapter)-[:PARENT_OF]->(s:sequential)-[:PARENT_OF]->(v:vertical)-[:PARENT_OF]->(i:video)
// where c.course_key = 'courseID' // uncomment this line to specify course id!
return
i.location as location,
c.course_key as course_id,
h.index as chapter_index,
s.index as sequential_index,
v.index as vertical_index,
i.index as video_index
order by
c.course_key,
h.index,
s.index,
v.index,
i.index
What are the video titles and Video IDs in a course?
match
(c:course)-[:PARENT_OF]->(h:chapter)-[:PARENT_OF]->(s:sequential)-[:PARENT_OF]->(v:vertical)-[:PARENT_OF]->(i:video)
where c.course_key = 'courseID' // replace with the course ID
return
i.location as location,
c.course_key as course_id,
i.index as video_index,
i.display_name as video_title,
i.edx_video_id
order by
c.course_key,
h.index,
s.index,
v.index,
i.index
Where are all in-use videos located in my course? (with direct link)
MATCH
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF]-> (n:item)
WHERE
c.course_key = 'courseID' // replace with the course ID
AND
n.block_type = 'video'
RETURN
n.edx_video_id as video_id,
n.location as block_id,
v.display_name as unit_name,
replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL,
n.edited_on,
n.visible_to_staff_only
What courses have not enabled video upload?
MATCH
(c:course)
WHERE
c.video_upload_pipeline = "{}"
RETURN
distinct c.course_key as course_id
Which courses allow video download?
MATCH
(c:course)-[:PARENT_OF*]->(v:video)
WHERE
v.download_video = true
RETURN
distinct c.course_key as course_id,
count(v) as num_downloadable_videos
Where are all my org's videos and what video source / URL are they being served from?
MATCH
(c:course)-[:PARENT_OF]->(h:chapter)-[:PARENT_OF]->(s:sequential)-[:PARENT_OF]->(v:vertical)-[:PARENT_OF]->(i:video)
WHERE
c.course_key contains 'edX'
RETURN
i.location as location,
c.course_key as course_id,
i.display_name as video_title,
i.html5_sources as video_source
ORDER BY
c.course_key,
i.display_name
How many videos use start/end times (i.e. trim the video)?
Simple query:
MATCH
(c:course)-[:PARENT_OF*]->(n:`video`)
WHERE
(n.start_time <> "0:00:00" or n.end_time <> "0:00:00")
RETURN
count(n.location)
Note that this is an over-estimate–many videos that have an explicit end_time
actually specify that end time to be the same as what the normal end of the video would be anyway (so there is effectively no trimming happening). Unfortunately, data around total video duration lives in edx-val
, and is not available from CourseGraph.
How many creative-commons licensed videos are there?
MATCH
(c:course)-[:PARENT_OF*]->(v:video)
WHERE
v.license contains "creative-commons"
RETURN
count(v.location) as num_videos
ORA
What courses have ORA with misconfigured dates?
For course teams, ORA due dates are set in a slightly different workflow than other assignments (in Unit as opposed to Subsection). On occasion, teams will forget to update ORA due dates in a course rerun, which blocks learners from accessing the assessment.
MATCH
(c:course)-[:PARENT_OF*]->(o:openassessment)
WHERE
o.submission_due < c.start
AND c.start <= '2017-02-13' and c.end > '2017-02-13' // or whatever the current dates are, to match only ongoing courses
AND o.visible_to_staff_only = False
RETURN
c.org as org,
c.course_key as course_id,
c.display_name as course_title,
o.display_name as ora_title,
o.location as location
ORDER BY
c.course_key
Which courses use peer assessment ORAs?
ORAs can use a combination of peer-assessment, self-assessment, and staff-assessment.
MATCH
(c:course)-[:PARENT_OF*]->(n:openassessment)
WHERE
c.org contains 'edX'
AND ANY(rubric
IN n.rubric_assessments
WHERE rubric contains 'peer' // rubric contains a peer assessment
)
RETURN
c.course_key,
count(n.location) as num_peer_oras
What settings are used for file uploads in ORA?
ORAs can be configured to allow file uploads, allowing for a preset or set of custom file types.
MATCH
(c:course) -[:PARENT_OF*] -> (o:openassessment)
WHERE
c.start < '2020-09-01' and c.end > '2020-09-01' // or whatever the current dates are, to match only ongoing courses
RETURN
c.course_key,
o.file_upload_type_raw, // an ORA setting we're interested in
o.white_listed_file_types; // an ORA setting we're interested in
Advanced settings
Which courses use teams?
MATCH
(c:course)
WHERE
c.teams_configuration <> "{}" // where teams_configuration is not null
RETURN
c.course_key,
c.org,
c.teams_configuration
Exams
Which courses have timed exams?
MATCH
(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE
NOT s.is_proctored_enabled // not proctored exam
AND NOT s.is_practice_exam // not practice exam
AND s.is_time_limited // is timed
AND s.default_time_limit_minutes <> 'None' AND s.default_time_limit_minutes > 0 // has time set
RETURN
distinct c.course_key,
c.start,
c.end,
c.self_paced as is_self_paced,
c.display_name,
c.org,
count(distinct s) as num_exams
When are all the upcoming finals?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
not n.is_practice_exam
and n.due >= "2018-03-26" // current date you are considering
and n.due <= "2018-04-10" // after above date
and n.display_name =~ '(?i).*final.*' // Use regex instead of CONTAINS to get case-insensitivity.
// FYI, for now spanish and english final exam sequentials contain the
// word "final". This will change once we add more languages.
RETURN
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
("https://courses.edx.org/courses/" + c.course_key + "/jump_to/" + n.location) as exam_link,
n.start as exam_start_date,
n.due as exam_due_date,
n.is_time_limited as is_time_limited,
n.is_proctored_enabled as is_proctored_enabled,
n.display_name as display_name
Which timed exams are upwards of 24 hours?
MATCH
(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE
NOT s.is_proctored_enabled // not proctored exam
AND NOT s.is_practice_exam // not practice exam
AND s.is_time_limited // is timed
AND s.default_time_limit_minutes <> 'None' AND s.default_time_limit_minutes > 1439 // has time set upwards of 24 hrs
RETURN
c.course_key as course_id,
c.start as course_start_date,
c.end as course_end_date,
s.display_name as exam_title,
s.default_time_limit_minutes as allotted_time_in_minutes,
s.start as exam_start_date,
s.due as exam_due_date
order by s.default_time_limit_minutes desc
Proctored Exams
Which courses have practice proctored exams?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.is_proctored_enabled
and n.is_practice_exam
RETURN
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
count(distinct(n)) as number_practice_exams
Which courses have official proctored exams?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.is_proctored_enabled
AND not n.is_practice_exam
RETURN
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
count(distinct(n)) as number_actual_exams
Link and start date for the first proctored exam in a course
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.is_proctored_enabled
AND not n.is_practice_exam
AND n.start >= "2018-01-01" // or current date
AND n.start <= "2018-03-01" // or future date
WITH
min(n.start) as first_exam_start_date,
c as c,
n as n
RETURN
c.org as organization,
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
first_exam_start_date,
head([x in collect(n) where x.start = first_exam_start_date | "https://courses.edx.org/courses/" + x.course_key + "/jump_to/" + x.location]) as link
Which courses have currently available proctored exams?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.is_proctored_enabled
AND not n.is_practice_exam
AND n.start <= "2017-01-09" // or whatever the current date is
AND n.due >= "2017-01-09" // or whatever the current date is
RETURN
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
count(distinct(n)) as number_currently_open_actual_exams
What are the release and due dates of the proctored exams?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.is_proctored_enabled
RETURN
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
n.location as exam_location,
n.start as exam_start_date,
n.due as exam_due_date,
n.is_practice_exam as is_practice_exam,
(n.is_time_limited and not n.is_practice_exam) as is_timed_exam
What is the proctoring backend in use by a course?
MATCH
(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
n.is_proctored_enabled
RETURN
c.course_key as course_key,
c.start as course_start_date,
c.end as course_end_date,
n.location as exam_location,
c.proctoring_provider
Course quality & integrity
What courses have randomized problem banks?
MATCH (c:course)-[:PARENT_OF*]->(n:library_content) // this takes advantage of the fact that all library_content items have mode: random
WHERE NOT n.visible_to_staff_only
RETURN distinct c.course_key, c.start, c.end, c.self_paced as is_self_paced, c.display_name, c.org, count(distinct c) as num_exams
What are the locations for randomized content blocks in a course?
MATCH
(c:course)-[:PARENT_OF*]->(v:library_content)
WHERE c.course_key = <course_key>
RETURN
v.location, v.display_name
Which courses use discussion blackout dates?
match (c:course)
where c.discussion_blackouts <> [] // non null blackout dates
return c.org, c.course_key, c.display_name, c.start, c.end, c.discussion_blackouts
Discussion forums
What courses don't use the built-in discussion forums?
MATCH (c:course) WHERE LENGTH([x IN c.tabs WHERE x CONTAINS "DiscussionTab"]) = 0 RETURN c.course_key
Which courses have in-context discussion blocks?
MATCH
(c:course) -[:PARENT_OF*]-> (n:item)
WHERE n.block_type = "discussion"
RETURN
distinct(n.block_type) as block_type,
c.course_key as course_id,
c.org as organization,
count(distinct(n)) as discussion_blocks,
count(n) as number
order by
number DESC
Course updates
Number of courses with visible updates, by pacing type
match (ci:course_info), (c:course)
where size([item in ci.items where item CONTAINS "u'status': u'visible'"]) > 1
and (ci.location ENDS WITH "/updates" or ci.location ENDS WITH "@updates") // this ensures we're just looking at updates, not handouts
and c.course_key = ci.course_key
return c.self_paced, count(c.course_key)
Note: we have to "join" on the course
block, because course_info
blocks aren't children of course
blocks and so don't inherit the self_paced
attribute from them.
LTI
Which courses have LTI components?
match (c:course)-[:PARENT_OF*]->(lti_consumer)
return distinct c.course_key, c.org, c.display_name
Where are graded LTI components - that are embedded in the course / that open an external link?
match
(c:course)-[:PARENT_OF*]->(b:lti_consumer)
WHERE b.graded = true
and b.launch_target = "iframe" // or "new_window"
return
b.location as location, c.course_key as course_id
Which courses use X LTI?
// NOTE: this query takes advantage of the fact that many (but not all) LTI components contain the resource name in the launch_url (e.g. Piazza LTIs contain 'piazza' in the URL
MATCH
(c:course)-[:PARENT_OF*]->(n:lti_consumer)
WHERE
n.launch_url CONTAINS 'piazza' // launch_url is where the LTI 'points' to. Most are lowercase
RETURN
n.org,
count(distinct n.course_key)
What are all the LTIs used on edX and in what volume are they being used?
MATCH
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
RETURN
distinct(n.lti_id) as lti_id,
count(n) as number
order by
number DESC
Due dates
Which courses had assignments due between X and X dates?
This query might be useful in situations where learners are unable to access the platform for a period of time (e.g. platform outage, natural disaster) which affects their ability to submit key assignments.
// NOTE: this does *not* include ORA due dates. ORA due dates are set on the component level, rather than the sequential level.
MATCH
(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE
s.due >= '2017-08-25' // replace with outage start date in 'YYYY-MM-DD' format
AND s.due <= '2017-09-02' // replace with outage end date in 'YYYY-MM-DD' format
RETURN
c.course_key,
c.org,
c.display_name as `course title`,
s.due,
count(distinct s) as `number of assignments due`
ORDER BY
c.course_key
Content: HTML and Problems
Which courses contain a certain keyword anywhere in their course content?
Example: we might be curious which courses mention the g2crowd tool anywhere in their HTML content.
MATCH
(c:course)-[:PARENT_OF*]->(h:html)
WHERE
h.data CONTAINS 'g2crowd' // replace with the keyword you are interested in
RETURN
h.course_key,
h.data,
"https://studio.edx.org/container/" + h.location as link
Example: we want to know whether a course has links to content on other websites:
MATCH
(c:course)-[:PARENT_OF*]->(h:html)
WHERE
h.data CONTAINS 'http'
AND c.course_key = 'courseID' // replace with the course ID
RETURN
h.course_key,
h.data,
"https://studio.edx.org/container/" + h.location as link
What courses contain scripts with python code?
Example: An org would like to know all their courses that require review for the python3 upgrade
MATCH
(c:course)-[:PARENT_OF*]->(p:problem)
Where
p.data CONTAINS 'loncapa/python' // this phrase identifies a python-graded problem
AND
c.org =~ '(?i)edx.*' // case-insensitive search to include all edX-variants
RETURN
p.org as org,
c.display_name, c.course_key, c.start, c.end,
count(distinct(p)) as numberOfProblems
Where all the python-graded problems?
Example: A partner needs guidance on what content to review for the python3 upgrade - give them links to all the content.
MATCH
(c:course)-[:PARENT_OF*]->(p:problem)
Where
p.data CONTAINS 'loncapa/python'
AND
c.course_key = 'courseID' // replace with the course ID
RETURN
c.org as org,
c.display_name, c.course_key, c.start, c.end, ("https://courses.edx.org/courses/" + c.course_key + "/jump_to/" + p.location) as problem_location
Which HTML modules have (illegal) self-closing <script> and <iframe> tags?
Example: We need to find which HTML modules and courses use <script /> or <iframe /> when they should use <script></script> or <iframe></iframe> instead. Using the self-closing version is invalid HTML, but worked because of some data munging we did in the old courseware view. It does not work in the new Courseware MFE.
MATCH
(v:vertical)-[:PARENT_OF*]->(h:html)
Where
h.data =~ '(?i)<iframe [^>]*\\/>' or
h.data =~ '(?i)<script [^>]*\\/>'
RETURN
h.course_key, v.location, 'https://courses.edx.org/xblock/'+ v.location, h.location, h.data
order by h.course_key
Which HTML modules use X iframe in them and where?
MATCH
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
WHERE
c.start < '2021-05-12' // insert today's date for live courses
AND
c.end > '2021-05-12' // insert today's date for live courses
AND
n.block_type = 'html'
AND
n.data CONTAINS 'geogebra'// insert a specfic iframe identifier
RETURN
n.course_key,
replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL;
Age of Course Data
When was the course last updated in Coursegraph?
This won't capture courses added since the last update, but this query can be used to see when Coursegraph itself was last updated, based on the last update date.
match (c:course)
return c.course_key, c.time_last_dumped_to_neo4j ORDER BY c.time_last_dumped_to_neo4j desc
Constructing Specific URLs from Your Results
If you're looking for specific blocks and want to quickly generate the url you can go to to see the block in action in the LMS, add this statement to your return values:
return ("https://courses.edx.org/courses/" + ITEM.course_key + "/jump_to/" + ITEM.location) as link
In this case, ITEM
would be the alias of the block you're looking for
Filtering a specific list
If you're looking for results in a specific list of items (courses), you can use the list syntax
where ITEM in [VALUE1, VALUE2, VALUE3]
where c.course_key in
["course-v1:edX+DemoX.1+2T2019",
"course-v1:edX+DemoX.1+3T2018",
"course-v1:edX+DemoX.1+2T2017"]