The most advanced search capabilities for legal and investigative purposes

ZyLAB’s advantages | Search techniques | Fuzzy and wild-card | Legal search

The ZyLAB search engine is optimized for maximum recall. In other words, instead of only finding the “best” search result, a search engine optimized for recall will find anything that could be relevant. This level of thoroughness is particularly important to intensive, high-level investigations.

ZyLAB’s 25 years of intensive R&D into search techniques has yielded a system that helps clients get the most from their data. Other products are unable to find some of your information or they require the user to pursue a costly data normalization process, manual key fielding, or other forms of data clean-up.

Find more with ZyLAB – without the burden to too much “noise”

  • ZyLAB can full text search textual data in more than 400 languages, in more than 700 different electronic file formats, e-mail, multimedia, and digitized paper.
  • In addition to searching and extracting textual content of documents, ZyLAB also searches and extracts hidden file properties of the documents. This provides a wealth of additional information on which to search.
  • With ZyLAB’s award winning, robust and scalable fuzzy search you can find words despite misspellings, scanning errors, Optical Character Recognition (OCR) errors, translation variations when a name is converted from another alphabet, spelling variations in pharmaceutical or chemical names, typos, and negotiated Booleans in legal discoveries.
  • E-mail is a very complex format with e-mails nested within other e-mails that contain still more e-mails, documents, calendar items, and tasks. When you use the ZyLAB’s E-mail Archiving Module, you can search every component of an e-mail message, even deeply embedded objects. This is made possible through the Microsoft Exchange Connector and E-mail conversion (PST, NSF or GroupWise to XML) tool included with the module.
  • Email threads and more than 100 e-mail properties can be extracted, searched, and visualized as well.
  • ZyLAB can identify a variety of bitmaps and OCR them in 200 languages (even with automatic language recognition) to make them searchable. These bitmaps can be searched whether they are stored on a file system or attached to an e-mail message.
  • Even when a word is not present in a document or when it is hidden in the document or file properties, ZyLAB’s Analytics Server can extract the hidden data or run the text of a document against a concept extractor to find semantic notions, create a summary, extract entities or complex patterns, or reveal unknown connections between persons, companies, locations and events. All of this extracted data can then be used to search, organize and rank documents more effectively (see also the Text Analytics section).
  • ZyLAB offers manual tagging and tools to organize documents, such as: static tables of contents, dynamic search folders, hierarchical concept trees, annotations, redactions, stamps, hyperlinks, categorization, database integrations, and manual key fielding.
  • ZyLAB provides several tools to customize the search engine behavior, such as: noise words, token identification, character mappings, code page and Unicode support, translatable operators, punctuation, and hyphen- and apostrophe- processing. Non-recognized and encrypted file formats can automatically be detected and moved to special locations for further processing. Plus, all aspects of the index, extraction and search processes have extended logging and audit functions.

Finding and managing information is critical, but so is the capability to manage that information after it has been found. No other vendor offers the array of tools that ZyLAB does to help you manage and control all types of information, regardless of format.

Wide variety of search techniques

ZyLAB has the following search techniques to help you find what you are looking for: Boolean-, (directed) proximity-, phrase-, fuzzy-, wildcard-, concept-, date-, key field-, file property-, document property-, entity-, progressive-, quorum-, transliteration -, and numeric range search.

In addition, various relevance ranking, search aids and navigational tools are available such as hit density, sorting by key field, hit highlighting, hit navigation, vocabulary, search history, synonyms, taxonomy support, key word in context (KWIC) view-, refine results (a.k.a. faceted search), find similar, advanced result list visualization (Treemap, Hyperbolic trees and integration with Google maps), federation (both federate and federated with open search ATOM standard), Internet Search engine integration (Sitemap.org support), automatic alerting and lookup of documents in a table of contents.

Extractors are available for entities, facts, events, file properties, document properties, key fields, HTML and XML tags, automatic language recognition, automatic summaries, machine translation, document category, etc. More information can be found in the Text Analytics section.

Fuzzy Search and wild-card searching

A fuzzy search can locate all occurrences of a word, together with all other words that are close in spelling to the original word. The degree of fuzziness specifies the degree of closeness to the original word and helps control the quantity of results returned. ZyLAB’s fuzzy search is optimized to detect Optical Character Recognition (OCR) errors, misspellings and spelling variations in names that are derived from non-roman scripts such as Cyrillic, Arabic, Farsi, Hindi, Hebrew, Chinese and Japanese.

A main advantage of the ZyLAB’s fuzzy algorithms is that its fuzzy search is language and application independent and does not need to be “trained” like many competitors’ products. ZyLAB’s fuzzy search retains excellent precision, even at high fuzzy degrees, and the difference in performance between large data sets and smaller data sets is negligible. Even when the first character of a word is different from a query word, ZyLAB’s fuzzy search will pick up that word. Unlike most other products, ZyLAB’s fuzzy search even allows the first character to be different.

In addition to the fuzzy search, ZyLAB can also search with a variety of wildcards without search-speed degradation: ABC*, *ABC, A*C, and even *ABC* are among the possibilities. Many other engines cannot do this, especially not the *ABC. For law enforcement and discovery, these search techniques are very important since it is virtually impossible to enumerate all variations on miss-recognized, miss-spelled or pre- and post-fixes in concatenated and inflected words.

Legal search demands a defensible search process:

  • • Support for large and nested complex Booleans, proximity and quorum search
    • Fast fuzzy (supporting first character changes) and advanced wildcard search (a*, *a, a*a, and *a*)
    • Hit-highlighting and hit-navigation
    • Reproducible and reliable relevance ranking
    • Forensic indexing of file and document properties
    • Automatic language recognition
    • Indexing capabilities for compound objects such as nested e-mails, compressed files, e-mail collections,
       Microsoft SharePoint, databases, and more
    • Extended index and search process auditing and reporting
    • Incremental indexing of live network data
    • Integration with records management, legal hold, identification, collection, legal review, (TIFF) productions
       and redaction processes
    • Advanced text analytics, automatic document categorization and machine translation
    • A search engine mentioned in existing case law