Converging Technologies Driving Billion-scale Elasticsearch

Article By : Mark Wright

As we are just at the beginning of convergence of technologies of different scales, the embedded space will also benefit.

Digital convergence is happening all around us as technologies that were originally unrelated come together in exciting new ways. The iPhone is a perfect example, combining a phone with a computer, a camera and sensors to deliver an outstanding experience.

Convergence is not new in the embedded space. Embedded devices have traditionally been subject to more severe restrictions of memory and processing and have embraced convergence as a way to get the best of many technological areas. So, as we are just at the beginning of convergence of technologies of different scales, with seemingly disparate technologies promising to join together to disrupt existing industries and usher in compelling new opportunities, the embedded space will also benefit. One new convergence is the use of k-NN nearest neighbor (k-NN) with in-memory acceleration processing to provide near real-time responses for billion-scale Elasticsearch operations.

Elasticsearch is a search engine that takes JSON requests for document searches and delivers JSON data as results. The Elasticsearch data format is a document with structured data encoded in JSON. Elasticsearch started as a search engine for text, but the database can cover any type of data, with each document having a unique ID and a data type.

Because the structure is “schema-free,” it allows documents to be defined to whatever the user needs. Examples of documents in Elasticsearch databases include:

  • Pictures used to identify consumer search requests.
  • Network data logs used to identify network intrusion, anomalies, or load imbalances.
  • Product receipts used to identify customer purchasing patterns and improve inventory management.
  • Network architecture used for automatic sharing and replication.
  • Text documents used to find specific literary instances.
  • Text documents with one-to-many mappings used for computer assisted translation.

Elasticsearch was designed to be distributed. It is scalable in infrastructure and flexible for local server, remote server, or cloud-based operation. Thanks to its open and restful API structure, the extensible search engine can be used effortlessly with plugins. One such plugin is from GSI Technology that offers a number of benefits including hardware accelerated k-NN, the use of vectors for multi-modal search, and merging score results.

Elasticsearch relies on its distributed computing support for scalability, and its blazing speeds are in the order of seconds for million-scale database searches. Because of its distributed nature and sharding support, Elasticsearch allows for the duplication of data, parallelizing the search and speeding it up for larger databases. The distributed functionality that comes from the HTTP command posting also allows multiple searches of differing resolution to be done by an embedded device – one on local resources, and one sent to upstream resources.

Core Elasticsearch uses a computationally heavy exhaustive match (match all), which slows it down or makes it quite expensive in duplicate hardware to support large-scale database search. One technique that can be used to increase the database size is k-NN search. It works by first looking for similarities in common groupings then doing the final search within those one or more groupings. This technique also allows large database searches to be done at edge scale servers instead of cloud-based compute farms for very latency sensitive applications.

Computationally challenging approach

While k-NN provides a methodology for Elasticsearch to support very large databases, such as those at billion-scale entries and above, it is compute-exhaustive. As a result, k-NN has been a challenge to accelerate due to the constraint of moving the databases between GPU or CPU cores.

One of the biggest limitations to workload acceleration is the limitation in data exchange required between processors and memory. A major drawback of the Von Neumann architecture used in modern processors is the overhead of data transfer between processors and storage. The CPU must go out and fetch data for every operation it does.

This architecture is even more inefficient in an offload acceleration environment. The performance of such systems is limited by the speed at which data can be exchanged via memory by the host requesting the operations and also by compute engines performing the operations.

Architectures that reduce the flow of data from the memory are being studied to help alleviate the Von Neumann bottleneck. However, the bottleneck is particularly egregious when dealing with memory intensive artificial intelligence applications. The operation of AI-related applications depends on the fast and efficient movement of massive amounts of data in memory. Trained databases need to be loaded into working memory and vectorized input queries. Next, they need to be processed and loaded for comparison functions to operate.

One proven technology already having an impact in the market is the Associative Processing Unit (APU). The beauty of in-memory acceleration is that storage itself becomes the processor. This is not a massive array of processing cores with cache memory close by, but rather a memory array with compute units built into the read-line architecture.

Thus, the APU is differentiated by having the memory array capable of accelerating compute. This type of “accelerated” processor has been shown to accelerate performance by orders of magnitude while reducing workload power consumption of standard servers.

The convergence of Elasticsearch, k-NN, and APU acceleration provides less latency and more queries per second. It also makes it possible to provide support for billion-scale database search at lower power than traditional CPU only or GPU-accelerated systems. In the embedded space, Elasticsearch can provide a means for doing a local search on an edge device while simultaneously sending an HTTP request for a deeper search up the network. Varying results can either be stitched together for an increasingly sharp answer, or only new exceptions can be incorporated.

An extreme edge device could apply its CPU resources to do a search on a locally pertinent database for speed. Then the use of the APU density multiplier allows the Elasticsearch network requests to be efficiently run on an edge server or aggregator instead of being sent to the cloud. Consider robots that can make autonomous decisions but still get backup validation or course corrections from an upstream deeper search. Consider automated vehicles that make immediate decisions based on a ruleset and local conditions while sending information via highway sign gateways and getting upstream road information and driving instructions returned.

Going forward, it will be exciting to see what new opportunities this convergence will enable.

This article was originally published on Embedded.

Mark Wright is the Director of Product Marketing for AI Processors with GSI Technology.

Leave a comment