Init search engine.

Description

  1.  

    1. Description

This is the first iteration of search capabilities in Blockstore. It adds the following:

  • An index and document mapping which can store the data needed for first version of LX search.

  • A REST API which can be used to create, edit and delete ES documents in the default index and to query them.

  1.  

    1. Author Comments, Concerns, and Open Questions

  • This has the minimum amount of functionality needed for the first prototype of the LX search interface. A lot more will be added once we are clear what the data sources are going to be, what the shape of the ownership, authorship, and permissions data is going to be and what query patterns will need to be supported.

  • ES queries are described in JSON and ES [expects](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html) queries to use a GET request with a body (it does support a query parameters format but that is very limited) . I tried to get [django-elasticsearch-dsl-drf](https://django-elasticsearch-dsl-drf.readthedocs.io/en/0.16.3/filtering_usage_examples.html) integrated as well as to come up with a query parameters format for the ES Query DSL. However, the DSL is quite complex with many capabilities and at this point it is not clear what subset we would want the REST API to be restricted to. So decided to go with POST request to `/index/<index_name>/search/` which forwards the query to ES and returns the response. At the later stage once we have more clarity about the query patterns we can look into if we can achieve all those with a GET request.

  • Changing the mapping of a field in an index, requires creating a new index and re-indexing that data into it. Index aliases are used to make this work seamlessly. Support will need to be added for migrations. An option is to use [`django-elastic-migrations`](https://github.com/HBS-HBX/django-elastic-migrations). But in any case some changes will need to be made to the index and document classes.

  • ES is not supposed to be [primary storage](https://www.elastic.co/guide/en/elasticsearch/resiliency/current/index.html). We will need to figure out how to re-acquire all the data that is going to be in an index in case there is a loss.

  1.  

    1. Test Instructions

To play around with the API:

Examples:

This will return documents whose `summary.title` or `summary.description` fields contain the search term sorted in descending order by the `_score` (this is a numeric value assigned to the result by ES). `from` indicates the response should contain results from the i-th result and `size` indicates the number of results to return in the request.

```
{
"from" : 0, "size" : 5,
"sort" : [
{ "_score" : "desc" }
],
"query": {
"bool": {
"should": [
{ "match": { "summary.title": "<query>" }},
{ "match": { "summary.description": "<query>" }}
]
}
}
}
```

To include counts for the facets we can add aggregations.

```
{
"from" : 0, "size" : 5,
"sort" : [
{ "_score" : "desc" }
],
"query": {
"bool": {
"should": [
{ "match": { "summary.title": "<query>" }},
{ "match": { "summary.description": "<query>" }}
]
}
},
"aggs": {
"ownership": {
"terms": { "field": "ownership.org_id" }
},
"tags": {
"terms": { "field": "tags.paths" }
}
}
}
```

The response will then contain an `aggregations` field with the data for the buckets:
```
"aggregations": {
"ownership": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "edX",
"doc_count": 3
},
{
"key": "OpenCraft",
"doc_count": 1
}
]
},
"tags": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "chemistry",
"doc_count": 2
},
{
"key": "green",
"doc_count": 2
},
{
"key": "physics",
"doc_count": 2
},
{
"key": "astronomy",
"doc_count": 1
},
{
"key": "red",
"doc_count": 1
}
]
}
},
```

Done

Assignee

Unassigned

Reporter

Open Source Pull Request Bot

Labels

None

Contributor Name

Usman Khalid

Repo

edx/blockstore

Customer

Epic Link

None

OSCM Assignee

None

Platform Map Area (Levels 1 &amp; 2)

None

Platform Map Area (Levels 3 &amp; 4)

None

Blended Hour Utilization Percentage

None

edX Theme

None

edX Squad

None

Github Lines Added

871

Github Lines Deleted

3

Priority

Unset