13 Video Dashboard Functions
13.1 Wrangling Functions
13.1.1 wrangle_video
13.1.1.1 Main Documentation
Generates cleaned video data as a csv within a specified course
directory
Description:
This function will automatically read files named
'generalized_video_heat.csv' and 'generalized_video_axis.csv' from
the specified course directory and output a csv named
'wrangled_video_heat.csv' in the same directory
Usage:
wrangle_video(input_course, testing = FALSE)
Arguments:
input_course: String of short name of course directory
Value:
No value returned
Examples:
wrangle_video(input_course = 'psyc1')
13.1.1.2 Additional Notes:
- In order for this function to execute properly, there must be two files in the course directory named “generalized_video_heat.csv” and “generalized_video_axis.csv”. These files are obtained from Google BigQuery. Typically, these files are automatically obtained through the “populate_courses.py” script within the “exec” directory.
input_course
corresponds to the “short name” within the “.config.json” file- The following are descriptions of the columns within the output csv file:
video_id
: Video ID hash stringvideo_name
: Name of the videousername
: Username of the learnermin_into_video
: Minute into video of the segment that the learner has watchedcount
: Number of times the learner has watched the segmentmode
: Whether or not the learner is auditing or a verified studentcertified
: Whether or not the student has been certifiedgender
: Gender of the learneractivity_level
: Length of time that the student has spent on the coursemax_stop_position
: The mode time at whichvideo_stop
events occur. The mode is used instead of the maximum because some videos havevideo_stop
events that occur at incorrect times such as 3 days.
course_order
: Order in which the video appears in the courseindex_chapter
: Index of the chapter in which the video appears inchapter
: Name of the chapter
- Each video segment is 20 seconds in length. This can be adjusted by changing the global constant
SEGMENT_SIZE
in thevideo_wrangling.R
file. - In order for a segment to be counted as being “viewed”, the user would have to watch the segment for at least 1 second before carrying out another event such as
video_pause
,video_seek
,page_close
etc. This threshold of 1 second can be adjusted via the global constantMIN_DURATION
in thevideo_wrangling.R
file. - The largest length of a video is set to be 1 hour. Any segments passed 1 hour will simply be ignored/truncated. This can be adjusted by changing the global constant
MAX_DURATION
in thevideo_wrangling.R
file.
13.1.2 obtain_raw_video_data
13.1.2.1 Main Documentation
Reads raw uncleaned .csv into a dataframe
Description:
Reads the raw generalized_video_heat.csv obtained through rbq.py
into a dataframe.
Usage:
obtain_raw_video_data(input_course, testing = FALSE)
Arguments:
input_course: Name of course directory (ex. psyc1, spd1, marketing,
etc)
testing: For developer use only. Boolean used to indicate to use
testing data.
Value:
'data': Dataframe containing raw student track log information
Examples:
obtain_raw_video_data(input_course = 'psyc1')
13.1.3 obtain_video_axis_data
13.1.3.1 Main Documentation
Reads video_axis.csv file
Description:
Reads the video_axis csv obtained through rbq.py into a dataframe.
For documentation on how to use rbq.py, please see
www.temporaryreferencelink.com
Usage:
obtain_video_axis_data(input_course, testing = FALSE)
Arguments:
input_course: Name of course directory (ex. psyc1, spd1, marketing,
etc)
testing: For developer use only. Boolean used to indicate to use
testing data.
Value:
'video_axis': Dataframe containing video course structure
information
Examples:
obtain_video_axis_data(input_course = 'psyc1')
13.1.4 write_wrangled_video_data
13.1.4.1 Main Documentation
Outputs cleaned data as csv
Description:
Writes cleaned data as a csv into the course correct directory
Usage:
write_wrangled_video_data(input_course, cleaned_data, testing = FALSE)
Arguments:
input_course: Name of course directory (ex. psyc1, spd1, marketing,
etc)
cleaned_data: Dataframe containing cleaned data. This cleaned data is
typically obtained through
testing: For developer use only. Boolean used to indicate to use
testing data. 'make_tidy_segments()'
Value:
No return value
Examples:
write_wrangled_video_data(input_course = 'psyc1', cleaned_data=start_end_df)
13.1.5 prepare_video_data
13.1.5.1 Main Documentation
Converts columns into proper variable types and adds additional columns
with video information
Description:
Additional columns added:
- 'max_stop_times': proxy for video length
- 'course_order': occurrence of video within course structure
- 'index_chapter': occurrence of chapter within course structure
- 'chapter_name': name of chapter
Usage:
prepare_video_data(video_data, video_axis)
Arguments:
video_axis: A dataframe containing course structure information.
Contains columns course_order, index_chapter, chapter_name
data: Raw input dataframe to be transformed. 'data' is obtained
through 'obtain_raw_video_data()'
Value:
'prepared_data': The prepared data with converted variable types
and extra columns
Examples:
prepare_video_data(data)
13.1.6 get_start_end_df
13.1.6.1 Main Documentation
Obtains start and end times for video events
Description:
Parses dataframe and adds columns 'start' and 'end' showing the
start and end time that a user watched a video
Usage:
get_start_end_df(data)
Arguments:
data: Dataframe containing tracklog data of students. This is
obtained typically through 'prepare_video_data()'
Value:
'start_end_df': Original dataframe with 'start' and 'end' columns
Examples:
get_start_end_df(data = data)
13.1.7 get_watched_segments
13.1.7.1 Main Documentation
Returns original dataframe with segment columns
Description:
Returns original dataframe with segement columns. Segment columns
are 0 if the segment is not located within the start and end
values and 1 otherwise.
Usage:
get_watched_segments(data)
Arguments:
data: Dataframe containing start and end columns. This dataframe is
typically obtained through 'get_start_end_df()'
Value:
'data': Original input dataframe with new segment columns
Examples:
get_watched_segments(data = start_end_df)
13.1.8 make_tidy_segments
13.1.8.1 Main Documentation
Returns tidy (more useable) version of input dataframe
Description:
Returns a tidy, more usable, version of the input dataframe.
Segment information is converted into a single column using
'gather()'
Usage:
make_tidy_segments(data)
Arguments:
data: Dataframe containing segment information. This dataframe is
typically obtained through 'get_watched_segments()'
Value:
'data': Tidy version of input dataframe.
Examples:
make_tidy_segments(data = start_end_df)
13.1.9 check_integrity
13.1.9.1 Main Documentation
Checks to make sure start and end data passes sanity checks
Description:
Returns a boolean of whether or not start and end data makes
sense. This checks for NA values, end times that are passed the
maximum length of the video, and extremely long and short watch
durations. The threshold for watch durations can be adjusted in
the global constants: 'MIN_DURATION' and 'MAX_DURATION'
Usage:
check_integrity(start, end, max_stop_position)
Arguments:
start: Time into video that the user has started watching the video
end: Time into the video that the user has stopped watching the
video
max_stop_position: Length of the video being watched
Value:
'integrity': Boolean of whether or not the data passes integrity
checks
Examples:
check_integrity(start, end, max_stop_position)
13.1.10 get_end_time
13.1.10.1 Main Documentation
Calculates video end time for non-video events using time stamps
Description:
Calculates video end time for non-video events using time stamps
Usage:
get_end_time(start, time, time_ahead, latest_speed)
Arguments:
start: Time into video that the user has started watching the video
time: Time stamp of when the user started watching the video
time_ahead: Time stamp of next event following the play event
latest_speed: The speed at which the user was watching the video
Value:
'end': Time into video that the user has stopped watching
Examples:
get_end_time(start, time, time_ahead, latest_speed)
13.1.11 get_mode
13.1.11.1 Main Documentation
Obtain most common value from list
Description:
Obtain most common value from list
Usage:
get_mode(x)
Arguments:
x: List containing integer values
Value:
'mode': The most common value within the list
Examples:
get_mode(x=c(0,1,2,2,2,3))
13.2 Server Functions
13.2.1 get_aggregated_df
13.2.1.1 Main Documentation
Aggregates dataframe by video and segment
Description:
Aggregates input dataframe by video (video_id) and segment
(min_into_video). Additionally, adds columns:
- 'unique_views'/'`Students`' (number of learners who started the
video),
- 'watch_rate'/'`Views per Student`' (number of students who have
watched the segment divided by unique_views),
- 'avg_watch_rate' (average of watch_rate per video)
- 'high_low' ('High Watch Rate', 'Low Watch Rate, or 'Normal')
- 'up_until' (1 if the average learner had watched up until the
particular min_into_video, 0 if they had not)
Usage:
get_aggregated_df(filt_segs, top_selection)
Arguments:
filt_segs: Dataframe containing students that have been filtered by
selected demographics. Typically obtained via
'filter_demographics()'
top_selection: Value of the number of top segments to highlight.
Value:
'aggregate_segment_df': Aggregated dataframe with additional
columns
Examples:
get_aggregated_df(filt_segs, 25)
13.2.1.2 Additional Notes:
This function will read the filtered data frame version of the output csv file from
wrangle_video
. As an example, this function can be used in the following way:tidy_segment_df <- read_csv("path/to/course/wrangled_video_heat.csv") filt_segs <- filter_demographics(tidy_segment_df) aggregated_df <- get_aggregated_df(filt_segs, 10)
- The
high_low
segment classification is based off a linear model (usinglm
) using the following features: course_order
: Index of the video arranged by course structuremin_into_video
: How far into the video the segment isThe
up_until
variable is simply obtained by looking at the maximum time that avideo_stop
event had occurred. As a consequence, if many students frequently skip to the end of the video without watching anything in between, this statistic may be misinterpreted. There are plans to change this in the future as it is very doable
13.2.2 get_ch_markers
13.2.2.1 Main Documentation
Obtains locations of chapter lines to be placed on visualizations
Description:
Obtains locations of chapter lines to be placed on visualizations
Usage:
get_ch_markers(filt_segs)
Arguments:
filt_segs: Dataframe containing students that have been filtered by
selected demographics. Typically obtained via
'filter_demographics()'
Value:
'ch_markers': List of values of where to place chapter lines on
visualizations
Examples:
get_ch_markers(filt_segs)
13.2.3 get_video_lengths
13.2.3.1 Main Documentation
Obtains dataframe with length of videos
Description:
Obtains dataframe with length of videos
Usage:
get_video_lengths(filt_segs)
Arguments:
filt_segs: Dataframe containing students that have been filtered by
selected demographics. Typically obtained via
'filter_demographics()'
Value:
'vid_lengths': Dataframe with the video lengths associated with
each video ID.
Examples:
get_video_lengths(filt_segs)
13.2.4 get_summary_table
13.2.4.1 Main Documentation
Obtains locations of chapter lines to be placed on visualizations
Description:
Obtains locations of chapter lines to be placed on visualizations
Usage:
get_ch_markers(filt_segs)
Arguments:
filt_segs: Dataframe containing students that have been filtered by
selected demographics. Typically obtained via
'filter_demographics()'
Value:
'ch_markers': List of values of where to place chapter lines on
visualizations
Examples:
get_ch_markers(filt_segs)
13.2.5 get_video_comparison_plot
13.2.5.1 Main Documentation
Obtains heatmap plot comparing videos against each other
Description:
Obtains heatmap plot comparing videos against each other
Usage:
get_video_comparison_plot(filtered_segments, module, filtered_ch_markers)
Arguments:
filtered_segments: Dataframe of segments and corresponding watch counts
filtered by demographics
module: String of module (chapter) name to display
filtered_ch_markers: List of values containing locations of where to
put chapter markers
Value:
'g': ggplot heatmap object
Examples:
get_video_comparison_plot(filtered_segments, module, filtered_ch_markers)
13.2.6 get_segment_comparison_plot
13.2.6.1 Main Documentation
Obtains heatmap plot comparing segments against each other
Description:
Obtains heatmap plot comparing segments against each other
Usage:
get_segment_comparison_plot(filtered_segments, module, filtered_ch_markers)
Arguments:
filtered_segments: Dataframe of segments and corresponding watch counts
filtered by demographics
module: String of module (chapter) name to display
filtered_ch_markers: List of values containing locations of where to
put chapter markers
Value:
'g': ggplot heatmap object
Examples:
get_segment_comparison_plot(filtered_segments, module, filtered_ch_markers)
13.2.7 get_top_hotspots_plot
13.2.7.1 Main Documentation
Obtains heatmap with segments of highest watch rate highlighted
Description:
Obtains heatmap with segments of highest watch rate highlighted
Usage:
get_top_hotspots_plot(filtered_segments, module, filtered_ch_markers)
Arguments:
filtered_segments: Dataframe of segments and corresponding watch counts
filtered by demographics
module: String of module (chapter) name to display
filtered_ch_markers: List of values containing locations of where to
put chapter markers
Value:
'g': ggplot heatmap object
Examples:
get_top_hotspots_plot(filtered_segments, module, filtered_ch_markers)
13.2.7.2 Additional Notes:
- This function is no longer used the plot was discarded after usability testing.
13.2.8 get_high_low_plot
13.2.8.1 Main Documentation
Obtains heatmap plot highlighting which segments have abnormally high
or low watch rates
Description:
Obtains heatmap plot highlighting which segments have abnormally
high or low watch rates
Usage:
get_high_low_plot(filtered_segments, module, filtered_ch_markers)
Arguments:
filtered_segments: Dataframe of segments and corresponding watch counts
filtered by demographics
module: String of module (chapter) name to display
filtered_ch_markers: List of values containing locations of where to
put chapter markers
Value:
'g': ggplot heatmap object
Examples:
get_high_low_plot(filtered_segments, module, filtered_ch_markers)
13.2.8.2 Additional Notes:
- This function returns a plot where segments with abnormally high and low watch rates are highlighted.
- “High” and “low” watch rates are determined by the residuals from a linear model obtained via
lm
- Please see source code and documentation for
get_aggregated_df
for more details.
13.2.9 get_up_until_plot
13.2.9.1 Main Documentation
Obtains heatmap plot highlighting which segment has been watched up
until on average
Description:
Obtains heatmap plot highlighting which segment has been watched
up until on average
Usage:
get_up_until_plot(filtered_segments, module, filtered_ch_markers)
Arguments:
filtered_segments: Dataframe of segments and corresponding watch counts
filtered by demographics
module: String of module (chapter) name to display
filtered_ch_markers: List of values containing locations of where to
put chapter markers
Value:
'g': ggplot heatmap object
Examples:
get_up_until_plot(filtered_segments, module, filtered_ch_markers)
13.2.9.2 Additional Notes:
- This function returns a plot in which segments are highlighted up until the average maximum stop time per student.
- It should be noted that this diagram may be misleading. Please see documentation for
get_aggregated_df
for more details.
13.2.10 get_rank
13.2.10.1 Main Documentation
Returns the ranking of a vector x
Description:
Returns the ranking of a vector x
Usage:
get_rank(x)
Arguments:
x: A vector of numeric values
Value:
'g': The ranking of the values within x
Examples:
get_rank(c(10, 20, 20, 22, 5))
13.2.10.2 Additional Notes:
- This function returns a data frame in which the the duration watched per minute video is calculated.
- This is calculated by (average time spent on video (minutes) by all learners who have started the video)/(length of video (minutes))
- It should be noted that the average time spent on video is calculated via the
count
in which the segment has been watched multiplied by the segment length. As a result, if users are consistently only watching 3 seconds of a 20 second segment, this number may be artificially inflated. This is because if a student watches more than 1 second of a segment, it will count as a “view”/“count” of the segment. This 1 second threshold can be adjusted via adjusting the global constantMIN_DURATION
found in thevideo_wrangling.R
file.