note

Help The Community! Whenever you come up with a new query, add it here. Even if you think the use case is obscure.

Help The Community! Whenever you come up with a new query, add it here. Even if you think the use case is obscure.

Table of Contents

Orphans

All non-orphan problems in a course

All non-orphan problems in a course
MATCH 
	(course)-[:PARENT_OF*]->(p:problem) 
WHERE 
	p.course_key = <course_key> 
RETURN 
	p

All root orphans in a course

All root orphans in a course
match 
	(n:item) 
where 
	NOT () -[:PARENT_OF]-> (n) 
	AND n.course_key ="<course_key>" 
	AND not n.detached  
	AND NOT n.block_type='course' 
return 
	n.location

Number of Non-orphan Items in the modulestore

Number of non-orphan blocks
MATCH 
	(c:course) -[:PARENT_OF*]-> (n) 
RETURN 
	count(distinct(c)) + count(distinct(n))

Note that this query isn't perfect. There are some "expected" orphans, like static tabs or about blocks. In a future version of neo4j, we'll include a "detached" parameter for these detached xblock classes.

Number of non-orphan blocks in each course

MATCH 
	(c:course)-[:PARENT_OF*]->(b) 
RETURN 
	c.course_key as id, 
	count(c) as blocks 
ORDER BY 
	blocks DESC

XBlocks

What XBlocks do we have? (total count)

XBlock counts
MATCH
    (c:course) -[:PARENT_OF*]-> (n:item)
RETURN
    distinct(n.block_type) as block_type,
    count(n) as number
order by
    number DESC

What xBlocks do we have? (across all courses, with counts by type)

This was used in 2016 and June 2020 for tracking / visualization.

XBlock counts
MATCH
    (c:course) -[:PARENT_OF*]-> (n:item)
RETURN
    distinct(n.block_type) as block_type,
    count(distinct(c)) as coursecount,
    count(n) as number
order by
    number DESC

Which courses have a specific XBlock field enabled?

XBlock field matches
MATCH 
	(c:course)-[:PARENT_OF*]->(n:`block_name`)
WHERE 	
	(n.field_name)
RETURN 
	c.org, 
	c.course_key
ORDER BY 
	c.org, 
	c.course_key

Where are all the XBlock types used in my course? (with direct link to where they are used)

MATCH
    (c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
WHERE
    c.course_key = 'courseID' // replace with the course ID
    AND
    n.block_type = 'problem' // insert desired block type here
RETURN  
    n.block_type,
    n.display_name,
    n.location as block_id,
    v.display_name as unit_location,
    replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL,
    n.edited_on, 
    n.visible_to_staff_only
ORDER BY
  n.edited_on;

What courses contain deprecated advanced modules? 

MATCH 
	(c:course) 
WHERE 
	ANY(module 
		IN 
			c.advanced_modules 
		WHERE 
			module = 'peergrading' 
			or module = 'rate' 
			or module = 'randomize' 
			or module = 'peergrading' 
			or module = 'mentoring' 
			or module = 'graphical_slider_tool' 
			or module = 'combinedopenended') 
	AND c.end <>"None"
RETURN 
	distinct c.course_key as course_key, 
	c.advanced_modules as modules, 
	c.end as course_end_date 
order by 
	c.end desc

What courses contain specific advanced modules? 

MATCH
    (c:course)
WHERE
'officemix' IN  c.advanced_modules
RETURN
    distinct c.course_key as course_key,
    c.end as course_end_date
order by
    c.end desc

What courses are using known third-party discussion forums?

// NOTE: this query takes advantage of the fact that many (but not all) LTI components contain the resource name in the launch_url (e.g. Piazza LTIs contain 'piazza' in the URL
MATCH 
	(c:course)-[:PARENT_OF*]->(n:lti_consumer) 
WHERE 
	n.launch_url CONTAINS 'piazza' OR  n.launch_url CONTAINS 'inscribe' OR n.launch_url CONTAINS 'edstem' OR n.launch_url CONTAINS 'yellowdig' OR n.launch_url CONTAINS 'discourse'
RETURN 
	distinct(c.course_key) as courseruns,
    c.org,
    n.launch_url,
    c.start,
    c.end

Courses with creative-common licenses

Orgs with creative-commons courses, sorted

MATCH 
	(c:course)
WHERE 
	c.license CONTAINS "creative-commons"
RETURN 
	distinct(c.org) as org, 
	count(distinct(c.course)) as number_creative_common_courses
ORDER BY 
	number_creative_common_courses DESC

All creative-commons courses

MATCH 
	(c:course)
WHERE 
	c.license CONTAINS "creative-commons"
RETURN 
	distinct(c.course) as course, 
	c.display_name as title, 
	c.org as org
ORDER BY 
	course

Cohorts and differentiated content

Courses with cohorts

Courses with cohorts
match 
	(c:course)
where 
	size([x in c.user_partitions WHERE x contains "scheme=<class 'openedx.core.djangoapps.course_groups.partition_scheme.CohortPartitionScheme'>"]) >= 1
return 
	c.course_key

This query is a little complex, and only works because the string representation of a user partition contains a string ( scheme=<class 'openedx.core.djangoapps.course_groups.partition_scheme.CohortPartitionScheme'> ) that we can use to determine if a user partition is a cohort. The list comprehension, 

[x in c.user_partitions WHERE x contains "scheme=<class 'openedx.core.djangoapps.course_groups.partition_scheme.CohortPartitionScheme'>"]

gives a filtered list of the cohorts in a course. We use the size function to see if there's more than one of them in a course.

Which courses use group visibility settings (i.e. has differentiated content)?

Courses with cohorts
match 
	(c:course)-[:PARENT_OF*]->(v:vertical)-[:PARENT_OF*]->(i:item) 
where 
	i.group_access <> '{}' // this attribute is {} when group visibility is not set
return 
	distinct(id(v)) as unit_id,
	c.course_key as course_id,
	c.display_name as course_title,
	c.org as org,
	v.display_name as unit_title,
	("https://studio.edx.org/container/" + v.location) as url

Courses with graded differentiated content

match 
	(n:course) -[:PARENT_OF*]-> (c:item)
with 
	c.course_key as course_key, 
	count(c) - sum(case when c.group_access <> '{}' 
	and c.graded then 0 else 1 end) as diff_content
where 
	diff_content <> 0
return 
	course_key

Problems

What types of "problem" XBlocks do we have?

XBlock field matches
MATCH 
	(c:course) -[:PARENT_OF*]-> (n:problem)
RETURN 
	// n.data contains tags that indicate what type of problem is being used. This is likely a non-exhaustive list!
	CASE
      WHEN n.data CONTAINS "<choiceresponse"
      	THEN "checkbox"
      WHEN n.data CONTAINS "<multiplechoiceresponse"
      	THEN "multiple choice"
      WHEN n.data CONTAINS "<numericalresponse" 
      	THEN "numerical input" 
      WHEN n.data CONTAINS "<stringresponse"
      	THEN "text input"
      WHEN n.data CONTAINS "<optionresponse"
      	THEN "drop-down"
      WHEN n.data CONTAINS "<imageresponse"
      	THEN "image response"
      WHEN n.data CONTAINS "<annotationresponse"
      	THEN "annotation response"
      WHEN n.data CONTAINS "<formularesponse"
      	THEN "formula response"
      WHEN n.data CONTAINS "<coderesponse"
      	THEN "code response"
      WHEN n.data CONTAINS "<customresponse"
      	THEN "custom"
	ELSE "other" 
    END AS problem_type, 
    count(*) as number

All jsinput problems, counted by org

All jsinput problems, counted by org
MATCH 
	(course)-[:PARENT_OF*]->(p:problem) 
WHERE 
	p.data 
CONTAINS 
	'jsinput' 
RETURN 
	distinct(p.org) as org, 
	count(distinct(p.course_key)) as numberOfCourses, 
	count(distinct(p)) as numberOfProblems

Courses with new drag and drop XBlocks

new drag and drop problems by course
MATCH 
	(c:course)-[:PARENT_OF*]->(n:`drag-and-drop-v2`)
RETURN 
	distinct(n.course_key), 
	count(distinct(n)), 
	c.start, 
	c.end 
order by 
	c.start DESC

Courses with old drag and drop problems

old drag and drop problems by course
MATCH 
	(c:course)-[:PARENT_OF*]->(p:problem)
WHERE 
	p.data CONTAINS "drag_and_drop_input"
RETURN 
	distinct(p.course_key), 
	count(distinct(p)), 
	c.start, 
	c.end 
order by 
	c.start desc

What courses use xqueue (code graders)?

We wanted to identify the open courses with code graders, and to find an example problem from each course so we could look up the queuename. Since code graders aren't a unique problem type, we looked for a phrase specific to code graders in the content of the problem.

match (c:course) -[:PARENT_OF*]->(n:problem) where n.data contains "coderesponse queuename" return c.org, c.course_key, count(n), collect(n.location)[..1] as example, c.start, c.end ORDER BY c.start DESC

What are the locations of code grader problems in a course? (Same as entry above but returns list of block + vertical locations that match

match (c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:problem) 
where n.data contains "coderesponse queuename"
and c.course_key = 'courseID' // replace with the course ID
return v.location, n.location

Where are a handful of modules that use MathJax?

MATCH
    (c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
WHERE
n.block_type = 'problem'// MathJax may also exist in 'html' block types
AND
n.data CONTAINS '[mathjax]' // Some MathJax equations may also be found using '[mathjax inline]'
    AND
    n.visible_to_staff_only = false 
    AND
    c.start <= '2021-06-29' and c.end > '2021-06-29' // insert today's date for live courses
RETURN  
    c.course_key,
    n.block_type,
    n.display_name,
    n.location as block_id,
    v.display_name as unit_location,
    replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL
    LIMIT 20;

Course Structure

Components that are direct children of subsections instead of units/verticals

Useful to find Components on old XML-authored courses that are children of the Subsection instead of the Unit (based on the Studio-enforced Course->Section->Subsection->Unit->Component hierarchy).

Assumes that the course(s) follows Course->Section->Subsection, though.

You can omit the c.course_key = "<course key>" and line if you want to query across all courses.

match (c:course) -[:PARENT_OF]-> (ch:chapter) -[:PARENT_OF]-> (s:sequential) -[:PARENT_OF]-> (cmp)
where
  c.course_key = "<course key>" and
  cmp.block_type <> "vertical" and
  cmp.block_type <> "unit"
return c.course_key as course_id, s.location as subsection_id, cmp.location as component_id

Subsections that contain word clouds and peer instruction

This query might be useful in when setting a bug priority - in this particular scenario, the two components would conflict when in the same subsection, but a query showed they were rarely used together.

match (w:word_cloud)<-[:PARENT_OF*]-(s:sequential) -[:PARENT_OF*]-> (p:`ubcpi`)
Return s.course_key,s.display_name, p.display_name, w.display_name

Length of the longest Subsection title

MATCH
    (n:sequential)
RETURN
    max(length(n.display_name))

Length of the longest Section title

Length of the longest Section title
MATCH
    (n:chapter)
RETURN
    max(length(n.display_name))

How many pages contain nested XBlocks? 

Find and count the XBlocks (p) that contain other XBlocks ( n), that appear in units (v). For all these searches, you'll get more real results by adding  constraints like "vertical in a recent course" or "is published to learners" 

Nested XBlocks in Units
MATCH
    (v:vertical) -[:PARENT_OF]-> (p) -[:PARENT_OF*]-> (n)
RETURN
    distinct(p.block_type), count(p)

Note for the below that we're not eliminating cyclical relationships; no guarantee that n is a leaf node.

Units with Nested XBlocks
//Count all units that contain nested xblocks i.e. blocks that contain other blocks
MATCH (v:vertical) -[:PARENT_OF]-> (p) -[:PARENT_OF*]-> (n) return count(distinct(v))

Useful toolkit for searching path depth:  

Searching path depth
//Find examples of problems that are nested 3 or more steps in a unit. 
MATCH
    (v:vertical) -[:PARENT_OF*3..]-> (p:problem)
WHERE NOT (p)-[:PARENT_OF]->()
return v.location, v.type, p.location
limit 10

//Other phrases
// (v:vertical) -[:PARENT_OF*3]-> (p:problem) = problem is exactly three steps from vertical
// (v:vertical) -[:PARENT_OF*..3]-> (p:problem) = problem is 3 or fewer steps from vertical

Subsection settings from the Course Outline

These queries relate to settings that can be configured from the Course Outline via the gear icon next to a subsection.

What courses have used the "hide after due" feature?
MATCH 
	(c:course) -[:PARENT_OF*]-> (n:sequential) 
WHERE 
	n.hide_after_due 
RETURN 
	distinct(n.course_key), 
	count(n)
How many active or upcoming courses use the "Limit Access" feature?

This refers to subsection prerequisites under the advanced tab.

MATCH 
	(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE 
	c.enable_subsection_gating
AND
    s.location CONTAINS "sequential+block"
AND
    c.end > "2021-01-01"    
RETURN 
	count(distinct(c.course_key))
How many gated subsections are there in active or upcoming courses using the “Limit Access” feature?
MATCH 
	(c:course)-[:PARENT_OF*]->(s:sequential) 
WHERE 
	c.enable_subsection_gating
AND
    s.location CONTAINS "sequential+block"
AND
    c.end > "2021-01-01"    
RETURN 
	distinct(c.course_key), 
	count(c)
ORDER BY
    c.course_key asc

Order of content on a page ([ :PRECEDES] relationship)

Particular layouts of a page might be of interest - for instance: What courses interleave exercises and text on a page? Does any page start with a discussion prompt before offering more guidance on how to participate? -[:PRECEDES]-> will help search for a specific order of children.

Search for pages that contain graded problem-text-problem on a page.

match (c:course)-[:PARENT_OF*]->(v:vertical)-[:PARENT_OF*]->(a:problem)-[:PRECEDES]->(h:html)-[:PRECEDES]->(b:problem)
where a.graded = True
return c.course_key, v.location

Length of the longest Section title

MATCH
    (n:chapter)
RETURN
    max(length(n.display_name))

Videos

Where are all the video components in a course?

match 
	(c:course)-[:PARENT_OF]->(h:chapter)-[:PARENT_OF]->(s:sequential)-[:PARENT_OF]->(v:vertical)-[:PARENT_OF]->(i:video) 
// where c.course_key = 'courseID' // uncomment this line to specify course id!
return 
	i.location as location, 
	c.course_key as course_id, 
	h.index as chapter_index, 
	s.index as sequential_index, 
	v.index as vertical_index, 
	i.index as video_index
order by 
	c.course_key, 
	h.index, 
	s.index, 
	v.index, 
	i.index

What are the video titles and Video IDs in a course?

match
    (c:course)-[:PARENT_OF]->(h:chapter)-[:PARENT_OF]->(s:sequential)-[:PARENT_OF]->(v:vertical)-[:PARENT_OF]->(i:video)
where c.course_key = 'courseID' // replace with the course ID
return
    i.location as location,
    c.course_key as course_id,
    i.index as video_index,
    i.display_name as video_title,
    i.edx_video_id
order by
    c.course_key,
    h.index,
    s.index,
    v.index,
    i.index


Where are all in-use videos located in my course? (with direct link)

MATCH
    (c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF]-> (n:item)
WHERE
    c.course_key = 'courseID' // replace with the course ID
    AND
    n.block_type = 'video' 
RETURN
    n.edx_video_id as video_id,
    n.location as block_id,
    v.display_name as unit_name,
   replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL,
    n.edited_on, 
    n.visible_to_staff_only

What courses have not enabled video upload? 

MATCH 
	(c:course) 
WHERE 
	c.video_upload_pipeline = "{}" 
RETURN 
	distinct c.course_key as course_id 

Which courses allow video download?

MATCH 
	(c:course)-[:PARENT_OF*]->(v:video)
WHERE 
	v.download_video = true
RETURN 
	distinct c.course_key as course_id,
	count(v) as num_downloadable_videos

Where are all my org's videos and what video source / URL are they being served from?

MATCH
	(c:course)-[:PARENT_OF]->(h:chapter)-[:PARENT_OF]->(s:sequential)-[:PARENT_OF]->(v:vertical)-[:PARENT_OF]->(i:video)
WHERE 
	c.course_key contains 'edX'
RETURN
	i.location as location,
	c.course_key as course_id,
	i.display_name as video_title,
	i.html5_sources as video_source
ORDER BY
	c.course_key,
	i.display_name

How many videos use start/end times (i.e. trim the video)?

Simple query:

MATCH 
	(c:course)-[:PARENT_OF*]->(n:`video`)
WHERE 	
	(n.start_time <> "0:00:00" or n.end_time <> "0:00:00")
RETURN 
	count(n.location)

Note that this is an over-estimate–many videos that have an explicit end_time actually specify that end time to be the same as what the normal end of the video would be anyway (so there is effectively no trimming happening). Unfortunately, data around total video duration lives in edx-val, and is not available from CourseGraph.

How many creative-commons licensed videos are there?

MATCH 
	(c:course)-[:PARENT_OF*]->(v:video)
WHERE 
	v.license contains "creative-commons"
RETURN 
	count(v.location) as num_videos

ORA

What courses have ORA with misconfigured dates?

For course teams, ORA due dates are set in a slightly different workflow than other assignments (in Unit as opposed to Subsection). On occasion, teams will forget to update ORA due dates in a course rerun, which blocks learners from accessing the assessment. 

MATCH 
	(c:course)-[:PARENT_OF*]->(o:openassessment) 
WHERE  
	o.submission_due < c.start
	AND c.start <= '2017-02-13' and c.end > '2017-02-13' // or whatever the current dates are, to match only ongoing courses
	AND o.visible_to_staff_only = False
RETURN 
	c.org as org, 
	c.course_key as course_id, 
	c.display_name as course_title, 
	o.display_name as ora_title, 
	o.location as location
ORDER BY 
	c.course_key

Which courses use peer assessment ORAs?

ORAs can use a combination of peer-assessment, self-assessment, and staff-assessment.

MATCH 
	(c:course)-[:PARENT_OF*]->(n:openassessment) 
WHERE 
	c.org contains 'edX'
	AND ANY(rubric
      IN n.rubric_assessments
      WHERE rubric contains 'peer' // rubric contains a peer assessment
    )
RETURN 
	c.course_key, 
	count(n.location) as num_peer_oras

What settings are used for file uploads in ORA?

ORAs can be configured to allow file uploads, allowing for a preset or set of custom file types.

MATCH
    (c:course) -[:PARENT_OF*] -> (o:openassessment)
WHERE
    c.start < '2020-09-01' and c.end > '2020-09-01'  // or whatever the current dates are, to match only ongoing courses
RETURN
    c.course_key,
    o.file_upload_type_raw,		// an ORA setting we're interested in 
    o.white_listed_file_types;	// an ORA setting we're interested in 

Advanced settings

Which courses use teams?

MATCH 
	(c:course) 
WHERE 
	c.teams_configuration <> "{}" // where teams_configuration is not null
RETURN 
	c.course_key,
    c.org,
    c.teams_configuration

Exams

Which courses have timed exams?

MATCH 
	(c:course)-[:PARENT_OF*]->(s:sequential) 
WHERE 
	NOT s.is_proctored_enabled // not proctored exam
	AND NOT s.is_practice_exam // not practice exam
	AND s.is_time_limited // is timed
	AND s.default_time_limit_minutes <> 'None' AND s.default_time_limit_minutes > 0 // has time set
RETURN 
	distinct c.course_key, 
	c.start, 
	c.end, 
	c.self_paced as is_self_paced, 
	c.display_name, 
	c.org, 
	count(distinct s) as num_exams

When are all the upcoming finals?

MATCH
    (c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
    not n.is_practice_exam
    and n.due >= "2018-03-26"  // current date you are considering
    and n.due <= "2018-04-10"  // after above date
    and n.display_name =~ '(?i).*final.*'  // Use regex instead of CONTAINS to get case-insensitivity.
    // FYI, for now spanish and english final exam sequentials contain the
    // word "final".  This will change once we add more languages.
RETURN
    c.course_key as course_key,
    c.start as course_start_date,
    c.end as course_end_date,
    ("https://courses.edx.org/courses/" + c.course_key + "/jump_to/" + n.location) as exam_link,
    n.start as exam_start_date,
    n.due as exam_due_date,
    n.is_time_limited as is_time_limited,
    n.is_proctored_enabled as is_proctored_enabled,
    n.display_name as display_name

Which timed exams are upwards of 24 hours?

MATCH 
	(c:course)-[:PARENT_OF*]->(s:sequential) 
WHERE 
	NOT s.is_proctored_enabled // not proctored exam
	AND NOT s.is_practice_exam // not practice exam
	AND s.is_time_limited // is timed
	AND s.default_time_limit_minutes <> 'None' AND s.default_time_limit_minutes > 1439 // has time set upwards of 24 hrs
RETURN 
	c.course_key as course_id,
    c.start as course_start_date,
    c.end as course_end_date,
    s.display_name as exam_title,
    s.default_time_limit_minutes as allotted_time_in_minutes,    
	s.start as exam_start_date, 
	s.due as exam_due_date
order by s.default_time_limit_minutes desc

Proctored Exams

Which courses have practice proctored exams?

MATCH 
	(c:course) -[:PARENT_OF*]-> (n:sequential) 
WHERE 
	n.is_proctored_enabled 
	and n.is_practice_exam 
RETURN 
	c.course_key as course_key, 
	c.start as course_start_date, 
	c.end as course_end_date, 
	count(distinct(n)) as number_practice_exams

Which courses have official proctored exams?

MATCH 
	(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE 
	n.is_proctored_enabled
	AND not n.is_practice_exam
RETURN 
	c.course_key as course_key, 
	c.start as course_start_date, 
	c.end as course_end_date, 
	count(distinct(n)) as number_actual_exams

Link and start date for the first proctored exam in a course

MATCH
    (c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
    n.is_proctored_enabled
    AND not n.is_practice_exam
    AND n.start >= "2018-01-01" // or current date
    AND n.start <= "2018-03-01" // or future date
WITH
	min(n.start) as first_exam_start_date,
    c as c,
    n as n
RETURN
	c.org as organization,
    c.course_key as course_key,
    c.start as course_start_date,
    c.end as course_end_date,
    first_exam_start_date,
    head([x in collect(n) where x.start = first_exam_start_date | "https://courses.edx.org/courses/" + x.course_key + "/jump_to/" + x.location]) as link

Which courses have currently available proctored exams?

MATCH 
	(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE 
	n.is_proctored_enabled
	AND not n.is_practice_exam
	AND n.start <= "2017-01-09" // or whatever the current date is
	AND n.due >= "2017-01-09" // or whatever the current date is
RETURN 
	c.course_key as course_key, 
	c.start as course_start_date, 
	c.end as course_end_date, 
	count(distinct(n)) as number_currently_open_actual_exams

What are the release and due dates of the proctored exams?

MATCH 
	(c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE 
	n.is_proctored_enabled
RETURN 
	c.course_key as course_key, 
	c.start as course_start_date, 
	c.end as course_end_date, 
	n.location as exam_location, 
	n.start as exam_start_date, 
	n.due as exam_due_date, 
	n.is_practice_exam as is_practice_exam, 
	(n.is_time_limited and not n.is_practice_exam) as is_timed_exam	

What is the proctoring backend in use by a course?

MATCH
    (c:course) -[:PARENT_OF*]-> (n:sequential)
WHERE
    n.is_proctored_enabled
RETURN
    c.course_key as course_key,
    c.start as course_start_date,
    c.end as course_end_date,
    n.location as exam_location,
    c.proctoring_provider

Course quality & integrity

What courses have randomized problem banks?

MATCH (c:course)-[:PARENT_OF*]->(n:library_content) // this takes advantage of the fact that all library_content items have mode: random
WHERE NOT n.visible_to_staff_only 
RETURN distinct c.course_key, c.start, c.end, c.self_paced as is_self_paced, c.display_name, c.org, count(distinct c) as num_exams

What are the locations for randomized content blocks in a course?

MATCH
    (c:course)-[:PARENT_OF*]->(v:library_content)
WHERE c.course_key = <course_key> 
RETURN
    v.location, v.display_name

Which courses use discussion blackout dates?

match (c:course) 
where c.discussion_blackouts <> [] // non null blackout dates
return c.org, c.course_key, c.display_name, c.start, c.end, c.discussion_blackouts

Discussion forums

What courses don't use the built-in discussion forums?

MATCH (c:course) WHERE LENGTH([x IN c.tabs WHERE x CONTAINS "DiscussionTab"]) = 0 RETURN c.course_key

Which courses have in-context discussion blocks? 

MATCH
    (c:course) -[:PARENT_OF*]-> (n:item)
WHERE n.block_type = "discussion"
RETURN
    distinct(n.block_type) as block_type,
    c.course_key as course_id,
    c.org as organization,
    count(distinct(n)) as discussion_blocks,
    count(n) as number
order by
    number DESC

Course updates

Number of courses with visible updates, by pacing type

match (ci:course_info), (c:course)
where size([item in ci.items where item CONTAINS "u'status': u'visible'"]) > 1
and (ci.location ENDS WITH "/updates" or ci.location ENDS WITH "@updates") // this ensures we're just looking at updates, not handouts
and c.course_key = ci.course_key
return c.self_paced, count(c.course_key)

Note: we have to "join" on the course block, because course_info blocks aren't children of course blocks and so don't inherit the self_paced attribute from them.

LTI

Which courses have LTI components?

match (c:course)-[:PARENT_OF*]->(lti_consumer)
return distinct c.course_key, c.org, c.display_name

Where are graded LTI components - that are embedded in the course / that open an external link?

match
(c:course)-[:PARENT_OF*]->(b:lti_consumer)
WHERE b.graded = true
and b.launch_target = "iframe" // or "new_window"
return
    b.location as location, c.course_key as course_id

Which courses use X LTI? 

// NOTE: this query takes advantage of the fact that many (but not all) LTI components contain the resource name in the launch_url (e.g. Piazza LTIs contain 'piazza' in the URL

MATCH 
	(c:course)-[:PARENT_OF*]->(n:lti_consumer) 
WHERE 
	n.launch_url CONTAINS 'piazza' // launch_url is where the LTI 'points' to. Most are lowercase
RETURN 
	n.org, 
    count(distinct n.course_key)

What are all the LTIs used on edX and in what volume are they being used?

MATCH	
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)	
RETURN	
distinct(n.lti_id) as lti_id,	
count(n) as number	
order by	
number DESC	

Due dates

Which courses had assignments due between X and X dates?

This query might be useful in situations where learners are unable to access the platform for a period of time (e.g. platform outage, natural disaster) which affects their ability to submit key assignments.

// NOTE: this does *not* include ORA due dates. ORA due dates are set on the component level, rather than the sequential level.
MATCH
	(c:course)-[:PARENT_OF*]->(s:sequential)
WHERE
	s.due >= '2017-08-25' // replace with outage start date in 'YYYY-MM-DD' format
    AND s.due <= '2017-09-02' // replace with outage end date in 'YYYY-MM-DD' format
RETURN
    c.course_key,
    c.org,
    c.display_name as `course title`,
    s.due,
    count(distinct s) as `number of assignments due`
ORDER BY
	c.course_key

Content: HTML and Problems

Which courses contain a certain keyword anywhere in their course content?

Example: we might be curious which courses mention the g2crowd tool anywhere in their HTML content.

MATCH 
	(c:course)-[:PARENT_OF*]->(h:html) 
WHERE 
	h.data CONTAINS 'g2crowd' // replace with the keyword you are interested in
RETURN 
	h.course_key, 
	h.data, 
	"https://studio.edx.org/container/" + h.location as link

Example: we want to know whether a course has links to content on other websites:

MATCH 
	(c:course)-[:PARENT_OF*]->(h:html) 
WHERE 
	h.data CONTAINS 'http'
	AND c.course_key = 'courseID' // replace with the course ID
RETURN 
	h.course_key, 
	h.data, 
	"https://studio.edx.org/container/" + h.location as link

What courses contain scripts with python code?

Example: An org would like to know all their courses that require review for the python3 upgrade

MATCH
	(c:course)-[:PARENT_OF*]->(p:problem)

Where
	p.data CONTAINS 'loncapa/python' // this phrase identifies a python-graded problem
AND
	c.org =~ '(?i)edx.*' // case-insensitive search to include all edX-variants
RETURN
	p.org as org,
	c.display_name, c.course_key, c.start, c.end,
	count(distinct(p)) as numberOfProblems

Where all the python-graded problems?

Example: A partner needs guidance on what content to review for the python3 upgrade - give them links to all the content.

MATCH
(c:course)-[:PARENT_OF*]->(p:problem)

Where
p.data CONTAINS 'loncapa/python'
AND
c.course_key = 'courseID' // replace with the course ID
RETURN
c.org as org,
c.display_name, c.course_key, c.start, c.end, ("https://courses.edx.org/courses/" + c.course_key + "/jump_to/" + p.location) as problem_location

Which HTML modules have (illegal) self-closing <script> and <iframe> tags?

Example: We need to find which HTML modules and courses use <script /> or <iframe /> when they should use <script></script> or <iframe></iframe> instead. Using the self-closing version is invalid HTML, but worked because of some data munging we did in the old courseware view. It does not work in the new Courseware MFE.

MATCH
    (v:vertical)-[:PARENT_OF*]->(h:html)
Where
    h.data =~ '(?i)<iframe [^>]*\\/>' or
    h.data =~ '(?i)<script [^>]*\\/>'
RETURN
	h.course_key, v.location, 'https://courses.edx.org/xblock/'+ v.location,  h.location, h.data
order by h.course_key


Which HTML modules use X iframe in them and where?

MATCH
(c:course) -[:PARENT_OF*]-> (v:vertical) -[:PARENT_OF*]-> (n:item)
WHERE
c.start < '2021-05-12' // insert today's date for live courses
AND
c.end > '2021-05-12' // insert today's date for live courses
AND
n.block_type = 'html'
AND
n.data CONTAINS 'geogebra'// insert a specfic iframe identifier
RETURN
n.course_key,
replace('https://courses.edx.org/courses/course-id/jump_to_id/','course-id',c.course_key) + right(n.location, 32) as Unit_URL;

Age of Course Data

When was the course last updated in Coursegraph?

This won't capture courses added since the last update, but this query can be used to see when Coursegraph itself was last updated, based on the last update date.

match (c:course)
return c.course_key, c.time_last_dumped_to_neo4j ORDER BY c.time_last_dumped_to_neo4j desc


Constructing Specific URLs from Your Results

If you're looking for specific blocks and want to quickly generate the url you can go to to see the block in action in the LMS, add this statement to your return values:

return ("https://courses.edx.org/courses/" + ITEM.course_key + "/jump_to/" + ITEM.location) as link

In this case, ITEM would be the alias of the block you're looking for

Filtering a specific list

If you're looking for results in a specific list of items (courses), you can use the list syntax

where ITEM in [VALUE1, VALUE2, VALUE3]

where c.course_key in 
["course-v1:edX+DemoX.1+2T2019",
"course-v1:edX+DemoX.1+3T2018",
"course-v1:edX+DemoX.1+2T2017"]