Challenge #3 – These are not the droids we’re looking for.

I’m pretty excited about challenge 3 of our coding challenge. It let me geek out in two ways:

  • Star Wars! We are creating a basic search tool to search the movie scripts. #ThisIsTheWay
  • I’ve been on a project for the last few years creating an intelligent enterprise search tool for a client. Much of what I did here I was thinking about how we did it on that project. It was fun to connect the dots (although this is far from what we built).

What are the requirements? Let’s check them out:


What should it do?

It should allow the user to find content in the provided movie script. It should accept a keyword and return that keyword in the context of its surrounding text. 

Acceptance Criteria

  1. Download the script(s) from Star Wars
    1. Ideally, your app should handle the download, but you could download it and include it in your app.
      1. EpisodeIV
      2. EpisodeV
      3. EpisodeVI
  2. Accept a single keyword to search against the script
  3. The results returned should include the keyword in the context of the surrounding text
  4. Highlight or otherwise indicate which word in the result is the keyword searched for
  5. Results should only return the first 10 results and include a count of all matching

Extra Credit!

  1. How far can you take this search app? Include paging, larger context blocks, sorting, etc.
  2. Add relevancy, as you want to define it: words nearer the start of the paragraph are weighted higher; words spoken by Luke are highest, C3PO lowest; spoken words are higher than descriptions; etc.
  3. Include the scripts from all of the Star Wars movies, give the user the option to select which movie to search against, or all.

And yes, this challenge, I went all in. I completed all 5 ACs, and 3 ECs! As I mentioned, I really enjoyed this one. When your passions and fandom align, it can be a lot of fun.

What I Built

I built one search end point that will allow the consumer to search against Star Wars Episodes 4, 5, and 6. The results include:

  • the keyword in context of a couple of other words
  • optional filters to filter by movie or character speaking
  • paging through the results
  • and the results are returned in a relevant order (the dark side is strong)

Watch a walkthrough of it working and the code:

AC 1: Download the script

I handle this at the time of execution, every time an API call is made, I am pulling down the movie script. Terrible idea you say? I agree. Ideally, this should be done once and loaded into an index, and then search against. However, the goal here wasn’t to create a search engine, it was to allow a user to search against the scripts. This is a very VERY finite scope (i.e. 3 movies vs all movies) so I kept it simple.

Files are downloaded as txt files and saved in a folder. AC 1, boom, done.

AC 2: Search by keyword

This is where it starts getting fun. I created an API where the user can send in a keyword, and get back results:

/search?keyword=droid

{
    "keyword": "droid",
    "resultCount": 37,
    "startIndex": 0,
    "pageSize": 10,
    "results": [
        {
            "text": "find any <strong>droid</strong>s",
            "rank": 130,
            "data": {
                "character": "VADER",
                "dialog": "Did you find any droids?",
                "movie": "SW_EpisodeIV.txt",
                "position": 519
            }
        },
        {
            "text": "these two <strong>droid</strong>s.",
            "rank": 110,
            "data": {
                "character": "LUKE",
                "dialog": "Greetings, Exalted One. Allow me to introduce myself. I am Luke Skywalker, Jedi Knight and friend to Captain Solo. I know that you are powerful, mighty Jabba, and that your anger with Solo must be equally powerful. I seek an audience with Your Greatness to bargain for Solo's life.  With your wisdom, I'm sure that we can work out an arrangement which will be mutually beneficial and enable us to avoid any unpleasant confrontation. As a token of my goodwill, I present to you a gift: these two droids.",
                "movie": "SW_EpisodeVI.txt",
                "position": 40
            }
        },
        {
            "text": "forget the <strong>droid</strong>s.",
            "rank": 110,
            "data": {
                "character": "LUKE",
                "dialog": "Let's go! And don't forget the droids.",
                "movie": "SW_EpisodeVI.txt",
                "position": 177
            }
        },
        {
            "text": "with you <strong>droid</strong>",
            "rank": 100,
            "data": {
                "character": "LANDO",
                "dialog": "Having trouble with you droid?",
                "movie": "SW_EpisodeV.txt",
                "position": 666
            }
        },
        {
            "text": "a protocol <strong>droid</strong>, are you",
            "rank": 90,
            "data": {
                "character": "NINEDENINE",
                "dialog": "Ah, good. New acquisitions. You are a protocol droid, are you not?",
                "movie": "SW_EpisodeVI.txt",
                "position": 49
            }
        },
        {
            "text": "last protocol <strong>droid</strong> and disintegrated",
            "rank": 90,
            "data": {
                "character": "NINEDENINE",
                "dialog": "Splendid! We have been without an interpreter since our master got angry with our last protocol droid and disintegrated him.",
                "movie": "SW_EpisodeVI.txt",
                "position": 55
            }
        },
        {
            "text": "This protocol <strong>droid</strong> might be",
            "rank": 90,
            "data": {
                "character": "NINEDENINE",
                "dialog": "Guard! This protocol droid might be useful. Fit him with a restraining bolt and take him back to His Excellency's main audience chamber.",
                "movie": "SW_EpisodeVI.txt",
                "position": 57
            }
        },
        {
            "text": "Droid of some",
            "rank": 90,
            "data": {
                "character": "HAN",
                "dialog": "Droid of some kind. I didn't hit it that hard. It  must have had a self-destruct.",
                "movie": "SW_EpisodeV.txt",
                "position": 131
            }
        },
        {
            "text": "Imperial probe <strong>droid</strong>.",
            "rank": 80,
            "data": {
                "character": "LEIA",
                "dialog": "An Imperial probe droid.",
                "movie": "SW_EpisodeV.txt",
                "position": 132
            }
        },
        {
            "text": "safe for <strong>droid</strong>s.",
            "rank": 80,
            "data": {
                "character": "LUKE",
                "dialog": "Yes, I'm sure it's perfectly safe for droids.",
                "movie": "SW_EpisodeV.txt",
                "position": 322
            }
        }
    ],
    "filters": [
        {
            "filterName": "movie",
            "values": [
                {
                    "value": "SW_EpisodeIV.txt",
                    "count": 27
                },
                {
                    "value": "SW_EpisodeVI.txt",
                    "count": 5
                },
                {
                    "value": "SW_EpisodeV.txt",
                    "count": 5
                }
            ]
        },
        {
            "filterName": "character",
            "values": [
                {
                    "value": "LUKE",
                    "count": 12
                },
                {
                    "value": "BEN",
                    "count": 4
                },
                {
                    "value": "OWEN",
                    "count": 4
                },
                {
                    "value": "THREEPIO",
                    "count": 3
                },
                {
                    "value": "NINEDENINE",
                    "count": 3
                },
                {
                    "value": "TROOPER",
                    "count": 2
                },
                {
                    "value": "LEIA",
                    "count": 2
                },
                {
                    "value": "HAN",
                    "count": 2
                },
                {
                    "value": "SECOND TROOPER",
                    "count": 1
                },
                {
                    "value": "LANDO",
                    "count": 1
                },
                {
                    "value": "BARTENDER",
                    "count": 1
                },
                {
                    "value": "OZZEL",
                    "count": 1
                },
                {
                    "value": "VADER",
                    "count": 1
                }
            ]
        }
    ],
    "_links": {
        "self": {
            "href": "http://localhost:8080/search?keyword=droid&startIndex=0&pageSize=10&movie=&character="
        },
        "nextPage": {
            "href": "http://localhost:8080/search?keyword=droid&startIndex=10&pageSize=10&movie=&character="
        },
        "allResults": {
            "href": "http://localhost:8080/search?keyword=droid&startIndex=0&pageSize=37&movie=&character="
        }
    }
}

There is a lot more than results in there, let’s keep going with the ACs and extras.

AC 3&4: Highlighted keyword in context

Since this is an API, having something “highlighted” is a UI capability and challenging with plain text. Instead, as you’ll see above, I wrapped the keyword in some basic HTML to show it highlighted if it was going to an HTML front end: a protocol <strong>droid</strong>, are you.

You’ll also see the keyword is included with other words around it, I selected the two words before and two words after the keyword to provide some context.

AC 5: Return 10 results

Finally, you’ll see only 10 results are shown above, however, there are 37 results. How do we get to the other 27? Read on.

Extra Credit: Taking it further

How far can I take this? Further for sure, and this is where some of my fun came out in this challenge. I did have to temper my excitement, as I could continue to spend too much time on making this a really FUNctional endpoint…

  • AC5 requires to show the first 10 items, and of course, to get to the next 10, we need to page through the results. I added paging: /search?keyword=droid&startIndex=10&pageSize=10.
    • Start index is the index to start the count for the page size. So page 1 is startIndex=0&pageSize=10, page 2 is 10 and 10, page 3 is 20 and 10. If the page size changes, then it’s the same ratio. If the page size is 20, then page 1 is 0 and 20; page 2 is 20 and 20; page 3 is 40 and 20. The consumer of course can mix and match to their heart’s desire.
  • I added filtering to allow the consumer to narrow down their results. Available filters are the movie name and the character speaking. These are sent as additional parameters: /search?keyword=droid&movie=SW_EpisodeVI&character=LUKE.
    • To ensure you will get data back, you should look at the filters returned in the initial search call, those will show you what filter values are available in this result set, with the counts of how many items match.
  • I added relevancy! Nothing too fancy. I decided to make the relevancy favor the dark side. If Darth Vadar or the Emperor said it, it will get a higher ranking than if Han or Obi said it. C3PO gets a negative rating because he’s my least favorite character in the series (of OG movies, Jar Jar wins overall). I also made the movies more relevant in the order of my favorites: VI, V, then IV. Check out the relevancy.json file for more details.

I wanted to include all of the movies, but I can’t find the scripts in this same format. This goes back to my earlier comment, if this was a real-life search application, I would create a database and indexer to properly create the data store to search against. Since I’m using a finite data source, I’m sticking with the defined format of these files.

How I improved over last challenge

Challenge 2 had some great growth points, and I continued them forward here. I also learned some new things like how to use regex, sort a list, work with a HashMap, and more!

Improved Error Handling

The custom error handling came out nice, I think. I have an ApiException object that extends Exception, and I added the HttpStatus to this object. Anywhere I want to throw and return a specific error message with status code, I can throw this object. In my controller, when an error occurs, I check for that exception and will return my ApiErrorResponse object with the specific parameters that I want. I could’ve just returned the ApiException but it comes with many more fields I didn’t want to deal with. My error responses are short and sweet:

{
    "message": "java.lang.ArrayIndexOutOfBoundsException: arraycopy: length -296 is negative",
    "httpStatusCode": 500
}


{
    "message": "keyword is missing",
    "httpStatusCode": 400
}

Lombok works now

In the last challenge, I mentioned using Lombok, and saw some objects didn’t work right. Well, I’m not sure what I was doing then, because it all works now 😊 Lombok is really nice and streamlines creating objects while allowing for custom properties as needed:

@Getter
@Setter
@NoArgsConstructor
public class SearchResult extends RepresentationModel<SearchResult> {
  private String keyword;
  private Integer resultCount;
  private Integer startIndex;
  private Integer pageSize;
  private List<ResultItem> results;
  private List<Filter> filters;
}

Beautifully simple.

How can I make this better

There are always ways we can improve, especially as we’re learning something new. What do you think? How can I do this better? Specifically, Java, not necessarily how I built this challenge.

I have one spot to seriously question, and that is getValueByName method on DatEntry:

  @JsonIgnore
  public String getValueByName(String fieldName) {
    switch (fieldName) {
      case "movie":
        return this.movie;
      case "character":
        return this.character;
      default:
        return null;
    }
  }

I need to dynamically grab values from my object, as I loop through them, and wasn’t sure how best to do this. In JavaScript I’d do something like data["movie"] and data["character"], but Java doesn’t appear to support that. The above little function works, but is it the best way? Let me know!

What else

I think I want to explore some patterns, find something that fits well in here. Patterns will vary per the project, but just to get my hands on some that I can rely on would be good. What do you recommend?

I’m not sure what more I could do here, other than building a real search engine. Let me know what you think! I don’t know what I don’t know ;)

Now for some code

As always, the code is up in the same repo as the other challenges on GitHub. Check it out here: https://github.com/DavidLozzi/SlalomCodingChallenge/tree/main/3_search.

Here’s my primary controller with the search endpoint, code located at /3_search/src/main/java/com/davidlozzi/search/SearchApplication.java

  @GetMapping("/search")
  @ResponseBody
  public ResponseEntity search(@RequestParam(value = "keyword", defaultValue = "") String keyword,
      @RequestParam(value = "startIndex", defaultValue = "0") Integer startIndex,
      @RequestParam(value = "pageSize", defaultValue = "10") Integer pageSize,
      @RequestParam(value = "movie", defaultValue = "") String movie,
      @RequestParam(value = "character", defaultValue = "") String character) {
    try {
      if (keyword.equals(null) || keyword.equals("")) {
        throw new ApiException("keyword is missing", HttpStatus.BAD_REQUEST);
      }
      Download.files();
      SearchResult results = new SearchResult();
      results.setKeyword(keyword);
      results.setStartIndex(startIndex);
      results.setPageSize(pageSize);

      List<ResultItem> resultsList = SearchFiles.search(keyword, movie, character);
      results.setResultCount(resultsList.size());

      results.add(linkTo(methodOn(SearchApplication.class).search(keyword, startIndex, pageSize, movie, character))
          .withSelfRel());

      if (resultsList.size() > 0) {
        ResultItem[] pageresults = Arrays.copyOfRange(resultsList.toArray(), startIndex, startIndex + pageSize,
            ResultItem[].class);
        pageresults = Arrays.stream(pageresults).filter(i -> i != null).toArray(ResultItem[]::new);
        results.setResults(Arrays.asList(pageresults));

        List<Filter> filters = SearchFiles.getFilters(resultsList);
        results.setFilters(filters);

        results.add(
            linkTo(methodOn(SearchApplication.class).search(keyword, startIndex + pageSize, pageSize, movie, character))
                .withRel("nextPage"));
        results.add(
            linkTo(methodOn(SearchApplication.class).search(keyword, 0, results.getResultCount(), movie, character))
                .withRel("allResults"));
        return new ResponseEntity<>(results, HttpStatus.OK);
      } else {
        return new ResponseEntity<>(HttpStatus.NO_CONTENT);
      }
    } catch (ApiException ex) {
      System.out.print(ex);
      ex.printStackTrace();
      ApiErrorResponse apiError = new ApiErrorResponse(ex.getMessage(), ex.getHttpStatus().value());
      return new ResponseEntity<>(apiError, ex.getHttpStatus());
    } catch (Exception ex) {
      System.out.print(ex);
      ex.printStackTrace();
      ApiErrorResponse apiError = new ApiErrorResponse(ex.toString(), HttpStatus.INTERNAL_SERVER_ERROR.value());
      return new ResponseEntity<>(apiError, HttpStatus.INTERNAL_SERVER_ERROR);
    }
  }

There is a lot going on in here, here’s a quick overview:

  • There’s a handful of parameters coming in, but only a keyword is actually required, so I check for that and throw my new custom ApiException!
  • I then download the files (yes, bad form, but it works for the challenge).
  • Then I start to create the SearchResult object, the primary object returned to the consumer.
  • We then perform the search against the data. This function creates our index from the downloaded text files, and then uses regex to find matching results. This function also applies relevancy.
  • From there, we take the results, do some HATEOAS, check if there are results. If there aren’t, we just return a 204 No Content status. If there is we continue.
  • To get the paging to work, I am leveraging Arrays.copyOfRange that will copy out a subset of items from an array
  • I then get the filters by looping through the result set and getting the distinct count of values.
  • Few more HATEOAS and then we return the fully loaded SearchResult!
  • I mentioned before, if an error occurs we have new exception handling. How does it look? My methods in the other classes may or may not throw my ApiException. If they do, then I create an ApiErrorResponse from that value and return it with its specified HttpStatus code. Otherwise, I create an ApiErrorResponse and return it with a 500 HttpStatus.

Let me know what you think!

Until next time, happy coding!

Leave a Reply

Powered by WordPress.com.

Up ↑

%d bloggers like this: