Jina 101

From Micro

to Macro

You can design at the micro-level and scale up to the macro-level. YAMLs becomes algorithms, threads become processes, Pods become Flows. The patterns and logic always remain the same. This is the beauty of Jina.

Document & Chunk

Document is anything you want to search for, and the input query you use when searching.

Documents can be huge though - how can we search for the right part? We do this by breaking a Document into Chunks. A Chunk is a small semantic unit of a Document, like a sentence, a 64x64 pixel image patch, or a pair of coordinates.

You can think of a Document like a chocolate bar. Documents have different formats and ingredients, but you can also break it into chunks any way you like. Eventually, what you buy and store are the chocolate bars, and what you eat and digest are the chunks. You don’t want to swallow the whole bar, but you don’t want to grind it into powder either; By doing that, you lose the flavor (i.e. the semantics).

YAML Config

Every part of Jina is configured with YAML files. YAML files offer customization, allowing you to change the behavior of an object without touching its code. Jina can build a very complicated object directly from a simple YAML file, or save an object into a YAML file.

Executors

How do we break down a Document into Chunks, and what happens next? Executors do all of this hard work, and each represents an algorithmic unit. They do things like encoding images into vectors, storing vectors on disk, ranking results, and so on. Each one has a simple interface, letting you concentrate on the algorithm and not get lost in the weeds. They feature persistence, scheduling, chaining, grouping, and parallelization out of the box. The properties of an Executor are stored in a YAML file. They always go hand in hand.

The Executors are a big family. Each family member focuses on one important aspect of the search system. Let’s meet:

  • Crafter: for crafting/segmenting/transforming the Documents and Chunks;
  • Encoder: for representing the Chunk as vector;
  • Indexer: for saving and retrieving vectors and key-value information from storage;
  • Ranker: for sorting results;

Got a new algorithm in mind? No problem, this family always welcomes new members!

Drivers

Executors do all the hard work, but they're not great at talking to each other. A Driver helps them do this by defining how an Executor behaves to network requests. It interprets network traffic into a format the Executor can understand, for example translating Protobuf into a Numpy array.

Peas

All healthy families need to communicate, and the Executor clan is no different. They talk to each other via Peas.

All healthy families need to communicate, and the Executor clan is no different. They talk to each other via Peas.

Pods


So now you've got lots of Peas talking to each other and rolling all over the place. How can you organize them? Nature uses Pods, and so do we.

A Pod is a group of Peas with the same property, running in parallel on a local host or over the network. A Pod provides a single network interface for its Peas, making them look like one single Pea from the outside. Beyond that, a Pod adds further control, scheduling, and context management to the Peas.

Flow

Now we've got a garden full of Pods, with each Pod full of Peas. That's a lot to manage! Say hello to Flow! Flow is like a Pea plant. Just as a plant manages nutrient flow and growth rate for its branches, Flow manages the states and context of a group of Pods, orchestrating them to accomplish one task. Whether a Pod is remote or running in Docker, one Flow rules them all!