After a series of rolling updates, here is a selection of new features and improvements:
- Look-through redactions: when you create a redacted version of a document in LawFlow, the system now additionally creates a separate look-through version with translucent redaction boxes, so you can see behind the redaction. This makes it easier to review redactions, and provides a convenient way to show permitted parties what has been redacted.
- Redaction page tracking: when you create a redacted version of a document in LawFlow, the system now records which pages have had redactions made on them. This can help with reviewing redactions, and being able to advise other parties which pages in a document have been redacted.
- Faster redactions: the redaction process now only processes pages that have one or more redactions. This provides a major performance improvement for longer documents where only a few pages are being redacted.
- Smaller redactions: due to an improved process, redacted documents are now typically significantly smaller in filesize than previously.
- Translation support: this release adds the ability to attach a translation (either in plain-text, Word or PDF format) to a document. For documents with attached translations, you are able to view the translation together with the original document, and search on the translation text. Further planned enhancements will allow the translated versions to be included in bundles.
- When files are extracted from a zip or 7-zip archive, the File Created and Last Modified dates are recorded against the extracted documents.
- Significant overall performance improvements for large projects.
- Searching: the ability to exclude documents matching one or more word lists (previously, searches could only include documents matching one or more word lists).
- Searching: improved highlighting of search results.
- Improved performance of the Discovery tab’s “Document type” box with a very large number of document type.
- Improved performance for bulk linking and unlinking of documents to email addresses.
- Improved performance and integrity checking of zip file extraction.
- Significantly improved performance of 7-zip file extraction (especially for very large, multi-GB 7z files).
- Automatic removal of Mac OS “resource fork” junk files from uploaded zip & 7-zip archives.
- Deleting pages from a PDF now removes all bookmarks (outline entries) from the PDF.
- Processing of barcode-separated batch scans (automatically splitting of scanned PDFs with separator sheets) now supports image-compressed PDFs.
- Better handling of zero-byte files (zero-byte files are usually the result of incorrectly or incompletely processed files).
- Parent production number column added to Excel discovery list with “extra columns” enabled.
- Improved validation of date values when importing discovery information.
Happy new year! We have a big todo-list of new features and improvements updates that we are already busy working on for 2021. In the meantime, here is a summary of recent new features and improvements.
Thank you as always to our great customers!
- Virtual folders that a document is in are now shown on the document’s Info tab.
- Hidden duplicates of a particular document can be viewed via the Related tab.
- The search results page show the number of duplicates excluded by the “exclude duplicates” option.
- Inclusion of custom fields on “export to Excel” and discovery list reports.
- The “Discoverable” workflow category is now separated into “Discoverable – partially reviewed” and “Discoverable – fully reviewed”.
- Improved ability to redact PDFs containing some internal errors.
- Improved ability to OCR PDFs containing some internal errors.
- Improved ability to detect OCR requirement for PDFs containing numerous small images.
- Improved handling of trivial OCR results.
- Improved handling of blank pages and non-text documents in compare documents tool.
- Added part-privileged redaction check to Discovery Review Checklist.
- Importing notes excludes duplicate notes.
- Improved handling of Zip files that otherwise cannot be opened due to over-length internal file paths.
- Better layout of discovery review tasks and notes.
- Incomplete tasks are displayed before completed tasks.
- Separation of incomplete and complete tasks on Discovery page.
- Various performance improvements.
Sunday 13 December 2020, 10:20am
We are pleased to report that the technical problems that arose at our hosted environment last week have been resolved. This was our first such outage in 8 years.
We apologise for any inconvenience this has caused, especially at this busy time of year. We will be reviewing our hosting, technical and supplier arrangements to take steps against this happening again.
The LawFlow Team
We’ve been busy labouring this Labour Day weekend, to carry out a major upgrade of the LawFlow servers. This will allow us to keep up with the significant growth in our user base, and the ever-increasing average size of discovery projects. It will also enable us to roll-out numerous new features that we have planned for the coming months and into 2021.
Thank you as always to our awesome customers! Keep an eye on this blog for news about further updates.
LawFlow – New Zealand’s leading e-discovery solution
A feature of our September update of LawFlow that we are particularly excited about is our new OCR system. While LawFlow has always provided OCR capability, the September update implements a new custom system that we have been developing for some time, incorporating leading OCR technology, and tailored specifically for e-discovery.
Key improvements of the new OCR system include:
- Significantly reduced lead time for uploaded documents to be OCRd.
- More robust processing due to improved identification and handling of corrupt or malformed PDFs.
- The ability to OCR more DRM-protected PDF files (some DRM restrictions may still prevent specific PDFs being processed).
- The ability to perform OCR on detected image-based pages within an otherwise text-based PDF. This can occur where text images are inserted into a natively-generated PDF, or where text-based and image-based PDFs are merged into one.
- Improved detection of document number stamps (frequently applied in e-discovery) that otherwise prevent a PDF (or certain pages of it) from being a candidate for OCR.
- Confidence scoring of OCR-processed documents.
- Detection of specific pages with low confidence scores.
- Separate processing of longer documents in order to reduce delays in processing smaller, faster-processable documents.
As with our previous OCR system, the new system is not cloud-based but is fully hosted on our hardware right here in New Zealand. This means we do not send project data to a third-party or overseas for OCR processing.
As always with OCR, accuracy depends heavily on the quality and characteristics of the input. In general terms, well-scanned clean black-and-white block text with standard fonts & font sizing is likely to produce a relatively accurate OCR result. Conversely, lower-quality scans, non-standard fonts, stylised/coloured layout, marks on the image, etc will likely result in lower accuracy.
However even with high quality input, there can still be inaccuracies – a “good” OCR accuracy rate is considered to be around 95-99%. There can also be complications and inaccuracies in reconstructing the OCR text into sentences or paragraphs. This should be taken into consideration when searching or otherwise using OCR-generated text.
The outline of the new OCR system’s basic processing stages for each document in a project (which remains similar to the previous system) is as follows:
- Determine whether the document is of a type suitable for OCR (PDF or supported image files). If not, do not attempt OCR.
- For PDF documents, if every page of this PDF file already contains detectable text above a de minimis level (after attempting to exclude any detected document number stamps) then do not attempt OCR.
- Run OCR process on the document (for PDFs, do this only for pages excluding any with detectable text above the de minimis level).
- If the OCR process detected any text, convert the document to a searchable PDF with the OCR text applied.
- Index the OCR detected text (for use in searching).
If you have any questions about our new OCR system or how to handle OCR text in your discovery project, get in touch with us and we’ll be happy to help!
We are continuing to work on a lot of new features & improvements to our LawFlow e-discovery system, and will continue rolling them out progressively. Here are highlights of the latest update.
- New OCR system to improve the performance, robustness and usefulness of the OCR process.
- Hide emails by address tool (similar to “hide emails by domain sender tool” in the previous update). This allows hiding (or deleting) of all emails sent from selected email addresses.
- Ability to exclude a saved search from within another search. This makes it easier to create searches that exclude documents meeting specific criteria. So if you want to do a search such as “All documents in criteria A, but excluding any in criteria B“, you can create a search for criteria B, save it as say “Criteria B”, and then create another search with criteria A that also excludes the “Criteria B” saved search.
- Option to toggle additional columns (author, recipient, etc) in the left-hand slide-out pane in Details view.
- Ability to link multiple documents to chronology events at the same time via the tray.
- Usability improvements to “link email addresses to parties” tool.
- Quick link for adding users added to home page.
- Improved detection of specific watermark text on PDFs.
- Improved no-content detection for vector-based PDFs.
Thanks as always to our great customers for your support and feedback.
This update includes two significant new features: redundant email detection, and duplicate image detection. We’ll post more on these features later. In the meantime, here is a summary of the new features and improvements in this update.
- Redundant email detection – automatic detection of whether the text of an email in a thread (chain) is incorporated in a later reply in the thread.
- Apparent duplicate image detection – for finding substantively duplicate images (in native form and embedded in PDFs).
- Added a search option for finding documents with embedded documents.
- Improved email thread detection for native emails (Outlook MSG files and extracted Outlook PSTs).
- Improved text duplicates detection: the system now attempts to detect discovery number stamps applied on other parties’ documents, and ignore them when checking for text duplicates. This can significantly improve text duplicate detection with documents received from other parties.
- Improved similar document detection algorithm.
- Better organisation of per-repository sub-categories in main Documents page.
- ‘Open in new tab’ link added to related documents.
- Improved search box for selecting authors & recipients on Discovery tab.
- Improved detection of hidden content in Excel files.
- Option to limit the length of the recipients column when browsing documents. This prevents a very long number of recipients from causing layout issues (e.g. pushing other content off the screen).
- Additions and removals of documents on custom lists is now logged in document history.
- More information on folder hide/delete pages.
- Improved performance of apparent duplicate email detection.
- Improved scaling of native images when converting to PDF format.
- Improved performance of “go to document” function.
- Project export bundle now includes a hyperlinked index.
- Improved ability to extract text from some email files with non-standard/malformed HTML content.
- Various performance improvements & fixes.
We hope everyone is doing well during the COVID-19 lockdown. Highlights of this update:
- Saved searches: you can now save a search configuration to re-use in multiple searches.
- Support for Outlook for Mac (OLM) files (the Mac equivalent of a PST file).
- Support for 7-zip (7z) compressed archive files. Uploaded 7z files will extract in the same manner as Zip files.
- Ability to set custom “badge” text for issues. This is useful for showing a shorter badge (e.g. an abbreviation or acronym) for issues with longer names.
- Option to tag all search results directly from the search results page (without having to use the document tray function).
- Improved detection of email dates from non-native emails.
- Improved performance of the Analysis and Parties pages.
- Easier merging of document types.
- Easier editing of issues.
- Add documents to tray function now shows more detail about matched documents.
- Improved performance when searching multiple lists.