This update includes two significant new features: redundant email detection, and duplicate image detection. We’ll post more on these features later. In the meantime, here is a summary of the new features and improvements in this update.
New features
- Redundant email detection – automatic detection of whether the text of an email in a thread (chain) is incorporated in a later reply in the thread.
- Apparent duplicate image detection – for finding substantively duplicate images (in native form and embedded in PDFs).
- Added a search option for finding documents with embedded documents.
Improvements
- Improved email thread detection for native emails (Outlook MSG files and extracted Outlook PSTs).
- Improved text duplicates detection: the system now attempts to detect discovery number stamps applied on other parties’ documents, and ignore them when checking for text duplicates. This can significantly improve text duplicate detection with documents received from other parties.
- Improved similar document detection algorithm.
- Better organisation of per-repository sub-categories in main Documents page.
- ‘Open in new tab’ link added to related documents.
- Improved search box for selecting authors & recipients on Discovery tab.
- Improved detection of hidden content in Excel files.
- Option to limit the length of the recipients column when browsing documents. This prevents a very long number of recipients from causing layout issues (e.g. pushing other content off the screen).
- Additions and removals of documents on custom lists is now logged in document history.
- More information on folder hide/delete pages.
- Improved performance of apparent duplicate email detection.
- Improved scaling of native images when converting to PDF format.
- Improved performance of “go to document” function.
- Project export bundle now includes a hyperlinked index.
- Improved ability to extract text from some email files with non-standard/malformed HTML content.
- Various performance improvements & fixes.