edx-video-pipeline for your Sandbox

STOP. This page is deprecated. Go Here

Configuring Video Uploads1

  1. Start a sandbox building
    1. Make sure to build with basic auth off.

  2. You'll need an AWS account:
    1. If you're not an admin, you will need to request EC2 access (full), S3 access (full), and a functioning ssh key pair.
    2. Login to AWS, go to S3 storage service.
    3. Create three four buckets.
      1. All s3 buckets must have unique names, but theme the names roughly for storage, ingest, images, and delivery. 
      2. Note the bucket names.
    4. Under the ingest and delivery buckets' Properties > Permissions > Edit CORS Configuration, add this CORS configuration (making sure to change the ALLOWEDORIGIN field to the salient sandbox URL:

      <?xml version="1.0" encoding="UTF-8"?>
      <CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
          <CORSRule>
              <AllowedOrigin>https://${SANDBOX STUDIO URL}</AllowedOrigin>
              <AllowedMethod>GET</AllowedMethod>
              <AllowedMethod>POST</AllowedMethod>
              <AllowedMethod>PUT</AllowedMethod>
              <AllowedMethod>HEAD</AllowedMethod>
              <MaxAgeSeconds>3000</MaxAgeSeconds>
              <AllowedHeader>*</AllowedHeader>
          </CORSRule>
      </CORSConfiguration>

    5. In your delivery bucket, under 'Properties' > 'Permissions', add Grantee: 'Everyone' and check 'List'

    6. If you have created a new AWS account, or wish to limit access, the following steps are recommended:
      1. In the AWS IAM security service, add a new user.
      2. Add the new user to a new Group, with (at least) the AmazonS3FullAccess and AmazonEC2FullAccesspolicy attached.

      3. Retrieve the IAM user key and IAM user secret, note them for later

  3. Once the sandbox is completed building, access your sandbox via ssh.
    1. Change these settings in /edx/app/edxapp/cms.env.json
      "FEATURES": {
          ...
          "ENABLE_VIDEO_UPLOAD_PIPELINE": true,
          ...
      },

      and

      ...
      VIDEO_UPLOAD_PIPELINE = {
          "BUCKET": "${S3 bucket name}",
          "ROOT_PATH": "" // LEAVE THIS BLANK
      },
      ...

      and

      ...
      "VIDEO_IMAGE_SETTINGS": {
              "DIRECTORY_PREFIX": "video-images/",
              ...
              "STORAGE_KWARGS": {
                  "bucket": "${S3 bucket name}",
                  "custom_domain": "s3.amazonaws.com/${S3 bucket name}",
                  ...
              },
              ...
          },
          ...
      }
      ...
    2. REMOVED: Change these settings in /edx/app/edxapp/cms.auth.json

    3. While you're at it, create a superuser (The provided example sets 'staff' as a superuser, which is just fine)
    4. Restart your servers and workers.

  4. Now log in, via web browser (utilizing a staff-access account) to your sandbox studio interface.

    1. Either create a unique course or navigate to one of the stock courses.
    2. In the course you wish to use, navigate to Settings > Advanced Settings 
    3. At the bottom of the page, in the "Video Upload Credentials" field, add the following configuration (don't neglect the brackets).2

      1. {
         "course_video_upload_token": "xxxx"
        }

You should now be able to see the Content > Video Upload option in your sandbox CMS, and see the uploaded videos in your S3 bucket.

  1. If you haven't, upload a video now. The upload tool accepts only *.mov and *.mp4 files, so you're going to want to use one of those.
    1. If you don't have a video file, you can use this one. You're going to need it in a moment.
  2. Check that a bucket object showed up in your upload bucket. Good? Good.

Configuring VAL Access3

  1. If you haven't, access your sandbox via ssh and create a superuser.

  2. Log in to the django admin (usually ${YOUR_SANDBOX_URL}/admin)
    1. Go to Oauth2 > Clients


    2. Click 'Add Clients' (rounded button, upper right hand corner)

    3. In the window, add the following information:
      1. User: Staff (usually pk=5, but the magnifying glass icon can help)
      2. Name: ${ANY RANDOM STRING} (e.g. 'veda_sandbox')
      3. URL: ${YOUR_SANDBOX_URL}/api/val/v0
      4. Redirect uris: ${YOUR_SANDBOX_URL}
      5. Client ID: autofilled, make note of this
      6. Client Secret: autofilled, make note of this
      7. Client Type: Confidential (Web Application)
      8. Logout URI: (can leave blank)

    4. Make Note of the Client ID and Client Secret, as these will be needed later

    5. Remember to Save!

Configuring openveda

We're going to run a lightweight version of edx-video-pipeline, with HLS support on a dedicated EC2 instance, using the subdirectory of the s3 bucket as a kind of "watch folder". This assumes some basic level of competency in using AWS EC2, and is in no way exhaustive.

  1. Launch an EC2 instance. 
    1. Any Linux AMI will do, we recommend the free-tier eligible AWS linux instance (ami-0b33d91d). 
      1. Be sure to have more than 6 GB of storage or so. The max upload size for a file is 5g, so you just need a li'l extra to deal with the various unmentionables in the codebase.
    2. Any size instance is acceptable, just be prepared to sit around and wait a little longer for your completed encodes if you go small. 
      1. For production VEDA we use the g2.2XL GPU nodes, (many $$$) but for the purposes of a sandbox a t2.micro is just fine. 
    3. Click through until you get to "Step 6: Configure Security Group".
      1. Allow ssh access from your IP
    4. Launch!
  2. ssh into your newly launched instance.
    1. Get the machine ready and clone the necessary repos:

      sudo yum -y update
      sudo yum -y install gcc
      sudo yum -y install git
      git clone https://github.com/yro/openveda
      git clone https://github.com/yro/v_videocompile
      git clone https://github.com/yro/vhls
    2. Download and compile a static build of ffmpeg:

      cd v_videocompile
      sudo python setup.py install
      v_videocompile
    3. Install the HLS dependencies:

      cd ~/vhls
      sudo python setup.py install
    4. Finally, install openveda:

      cd ~/openveda
      # we're going to be altering the configurations, so this modified command:
      sudo python setup.py develop
      
    5. Now we're ready to configure. Remember all of those keys, secret keys, passwords, and access tokens I asked you to take a note of? Now is where they shine. Openveda has a config wizard that should walk you through it:

      openveda -config

      Follow the prompts. You will need the following information:

      • VEDA Working Directory (leave blank)

      • AWS S3 Storage Bucket Name

      • Studio Ingest Bucket Name (this says 'optional', but for you it is not)
      • AWS S3 Deliver Bucket Name (streaming bucket)
      • AWS S3 Images Bucket Name
      • VAL Token URL (this is also not optional) < "https://{YOUR_SANDBOX_URL}/oauth2/access_token"
      • VAL API URL < "https://{YOUR_SANDBOX_URL}/api/val/v0/videos"
      • VAL Username (This will be 'staff' if you have set staff as the authorized VAL user)
      • VAL Password (Your authorized VAL user's studio password)
      • VAL Client ID
      • VAL Secret Key

You should be able to run (and receive):

python ~/openveda/openveda/pipeline/pipeline_val.py
# And then you'll get:
>${AN_ACCESS_TOKEN}

Now run (while in session):

openveda -i

You should see a couple of terminal progress bars, as the video encodes to its various endpoints.

Once it's done (and if you're using a t2.micro instance, maybe go get a snack or something while you're waiting), you should be able to check back once it's done and see:

You might need to refresh the page. That's a functional video pipeline!

Running edx-video-pipeline in the Background4

While ssh'd in to your EC2 instance, run:

nohup openveda -i&>openveda.out&

Now you can log out and walk away. Don't forget to terminate your instance when you're done.

The ingest loop is fragile, and isn't defensive against connectivity issues, but you don't expect to handle a lot of traffic, right? Right.

This has no provisions for monitoring, but should be fairly straightforward (and verbose) if something goes wrong.

Not working?

Let's check the logs.

Logs are accessible here:

ssh -i {{your_aws_ssh_key}} ec2-user@{{video pipeline sandbox's ec2 public IP}}
cat ~/openveda.out

Cleanup

Terminate your EC2 instance, delete your buckets, and you're done!

Errors

Openveda should have some limited provisions for basic user errors. If you see something that doesn't seem right, don't be shy to point it out. 




Notes:

  1. Adapted from the edx-platform wiki
  2. The course_video_upload_token can be any non-null string, and it is important in production, where we need to differentiate between thousands of video workflows, on the sandbox we're simply using a single workflow, so use any non-null string.
  3. This is extremely bad and shameful. Do NOT use this in a production setting. You should follow the instructions of your friendly devops engineer.
  4. Don't forget to terminate your EC2 instance!