Grouping documents in Elasticsearch

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. 

I use Kibana Console for Elasticsearch queries. First, we need to create an index called “posts” with the following query:

PUT posts

After we have created the index, we will need to create a mapping for documents. For the “posts” index, we will create a mapping with the following properties:

PUT posts/_mapping/_doc 
{
  "properties": {
    "title": {
      "type": "keyword"
    },
    "body": {
      "type": "text"
    },
    "category": {
      "properties": {
        "id":   { "type": "integer"  },
        "name": { "type": "keyword"  }
      }
    }
  }
}

Let’s add some documents to our index.

PUT /posts/_doc/1
{
  "title": "Hello world!",
  "body": "Lorem ipsum dolor sit amet",
  "category": {
    "id": 1,
    "name": "Development"
  }
}
 
PUT /posts/_doc/2
{
  "title": "Good bye world!",
  "body": "Lorem ipsum dolor sit amet",
  "category": {
    "id": 2,
    "name": "Travel"
  }
}
 
PUT /posts/_doc/3
{
  "title": "Ruby makes me happy!",
  "body": "Lorem ipsum dolor sit amet",
  "category": {
    "id": 1,
    "name": "Development"
  }
}

And now we can group documents by category via aggs.

GET /posts/_search
{
  "aggs": {
    "group_by_category": {
      "terms": {
        "field": "category.id",
        "min_doc_count": 1
      },
      "aggs": {
        "point_of_sales": {
          "top_hits": {}
        }
      }
    }
  },
  "size": 0
}

Here are the results:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_category" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : 1,
          "doc_count" : 2,
          "point_of_sales" : {
            "hits" : {
              "total" : 2,
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "posts",
                  "_type" : "_doc",
                  "_id" : "1",
                  "_score" : 1.0,
                  "_source" : {
                    "title" : "Hello world!",
                    "body" : "Lorem ipsum dolor sit amet",
                    "category" : {
                      "id" : 1,
                      "name" : "Development"
                    }
                  }
                },
                {
                  "_index" : "posts",
                  "_type" : "_doc",
                  "_id" : "3",
                  "_score" : 1.0,
                  "_source" : {
                    "title" : "Ruby makes me happy!",
                    "body" : "Lorem ipsum dolor sit amet",
                    "category" : {
                      "id" : 1,
                      "name" : "Development"
                    }
                  }
                }
              ]
            }
          }
        },
        {
          "key" : 2,
          "doc_count" : 1,
          "point_of_sales" : {
            "hits" : {
              "total" : 1,
              "max_score" : 1.0,
              "hits" : [
                {
                  "_index" : "posts",
                  "_type" : "_doc",
                  "_id" : "2",
                  "_score" : 1.0,
                  "_source" : {
                    "title" : "Good bye world!",
                    "body" : "Lorem ipsum dolor sit amet",
                    "category" : {
                      "id" : 2,
                      "name" : "Travel"
                    }
                  }
                }
              ]
            }
          }
        }
      ]
    }
  }
}

Thanks for having read this post. Soon there will be other posts about Elastic as well. If you like this post, please, share it on social networks.

Leave a comment

Your email address will not be published. Required fields are marked *