Sunday 13 December 2020, 10:20am
We are pleased to report that the technical problems that arose at our hosted environment last week have been resolved. This was our first such outage in 8 years.
We apologise for any inconvenience this has caused, especially at this busy time of year. We will be reviewing our hosting, technical and supplier arrangements to take steps against this happening again.
The LawFlow Team
We’ve been busy labouring this Labour Day weekend, to carry out a major upgrade of the LawFlow servers. This will allow us to keep up with the significant growth in our user base, and the ever-increasing average size of discovery projects. It will also enable us to roll-out numerous new features that we have planned for the coming months and into 2021.
Thank you as always to our awesome customers! Keep an eye on this blog for news about further updates.
LawFlow – New Zealand’s leading e-discovery solution
A feature of our September update of LawFlow that we are particularly excited about is our new OCR system. While LawFlow has always provided OCR capability, the September update implements a new custom system that we have been developing for some time, incorporating leading OCR technology, and tailored specifically for e-discovery.
Key improvements of the new OCR system include:
- Significantly reduced lead time for uploaded documents to be OCRd.
- More robust processing due to improved identification and handling of corrupt or malformed PDFs.
- The ability to OCR more DRM-protected PDF files (some DRM restrictions may still prevent specific PDFs being processed).
- The ability to perform OCR on detected image-based pages within an otherwise text-based PDF. This can occur where text images are inserted into a natively-generated PDF, or where text-based and image-based PDFs are merged into one.
- Improved detection of document number stamps (frequently applied in e-discovery) that otherwise prevent a PDF (or certain pages of it) from being a candidate for OCR.
- Confidence scoring of OCR-processed documents.
- Detection of specific pages with low confidence scores.
- Separate processing of longer documents in order to reduce delays in processing smaller, faster-processable documents.
As with our previous OCR system, the new system is not cloud-based but is fully hosted on our hardware right here in New Zealand. This means we do not send project data to a third-party or overseas for OCR processing.
As always with OCR, accuracy depends heavily on the quality and characteristics of the input. In general terms, well-scanned clean black-and-white block text with standard fonts & font sizing is likely to produce a relatively accurate OCR result. Conversely, lower-quality scans, non-standard fonts, stylised/coloured layout, marks on the image, etc will likely result in lower accuracy.
However even with high quality input, there can still be inaccuracies – a “good” OCR accuracy rate is considered to be around 95-99%. There can also be complications and inaccuracies in reconstructing the OCR text into sentences or paragraphs. This should be taken into consideration when searching or otherwise using OCR-generated text.
The outline of the new OCR system’s basic processing stages for each document in a project (which remains similar to the previous system) is as follows:
- Determine whether the document is of a type suitable for OCR (PDF or supported image files). If not, do not attempt OCR.
- For PDF documents, if every page of this PDF file already contains detectable text above a de minimis level (after attempting to exclude any detected document number stamps) then do not attempt OCR.
- Run OCR process on the document (for PDFs, do this only for pages excluding any with detectable text above the de minimis level).
- If the OCR process detected any text, convert the document to a searchable PDF with the OCR text applied.
- Index the OCR detected text (for use in searching).
If you have any questions about our new OCR system or how to handle OCR text in your discovery project, get in touch with us and we’ll be happy to help!
We are continuing to work on a lot of new features & improvements to our LawFlow e-discovery system, and will continue rolling them out progressively. Here are highlights of the latest update.
- New OCR system to improve the performance, robustness and usefulness of the OCR process.
- Hide emails by address tool (similar to “hide emails by domain sender tool” in the previous update). This allows hiding (or deleting) of all emails sent from selected email addresses.
- Ability to exclude a saved search from within another search. This makes it easier to create searches that exclude documents meeting specific criteria. So if you want to do a search such as “All documents in criteria A, but excluding any in criteria B“, you can create a search for criteria B, save it as say “Criteria B”, and then create another search with criteria A that also excludes the “Criteria B” saved search.
- Option to toggle additional columns (author, recipient, etc) in the left-hand slide-out pane in Details view.
- Ability to link multiple documents to chronology events at the same time via the tray.
- Usability improvements to “link email addresses to parties” tool.
- Quick link for adding users added to home page.
- Improved detection of specific watermark text on PDFs.
- Improved no-content detection for vector-based PDFs.
Thanks as always to our great customers for your support and feedback.
This update includes two significant new features: redundant email detection, and duplicate image detection. We’ll post more on these features later. In the meantime, here is a summary of the new features and improvements in this update.
- Redundant email detection – automatic detection of whether the text of an email in a thread (chain) is incorporated in a later reply in the thread.
- Apparent duplicate image detection – for finding substantively duplicate images (in native form and embedded in PDFs).
- Added a search option for finding documents with embedded documents.
- Improved email thread detection for native emails (Outlook MSG files and extracted Outlook PSTs).
- Improved text duplicates detection: the system now attempts to detect discovery number stamps applied on other parties’ documents, and ignore them when checking for text duplicates. This can significantly improve text duplicate detection with documents received from other parties.
- Improved similar document detection algorithm.
- Better organisation of per-repository sub-categories in main Documents page.
- ‘Open in new tab’ link added to related documents.
- Improved search box for selecting authors & recipients on Discovery tab.
- Improved detection of hidden content in Excel files.
- Option to limit the length of the recipients column when browsing documents. This prevents a very long number of recipients from causing layout issues (e.g. pushing other content off the screen).
- Additions and removals of documents on custom lists is now logged in document history.
- More information on folder hide/delete pages.
- Improved performance of apparent duplicate email detection.
- Improved scaling of native images when converting to PDF format.
- Improved performance of “go to document” function.
- Project export bundle now includes a hyperlinked index.
- Improved ability to extract text from some email files with non-standard/malformed HTML content.
- Various performance improvements & fixes.
We hope everyone is doing well during the COVID-19 lockdown. Highlights of this update:
- Saved searches: you can now save a search configuration to re-use in multiple searches.
- Support for Outlook for Mac (OLM) files (the Mac equivalent of a PST file).
- Support for 7-zip (7z) compressed archive files. Uploaded 7z files will extract in the same manner as Zip files.
- Ability to set custom “badge” text for issues. This is useful for showing a shorter badge (e.g. an abbreviation or acronym) for issues with longer names.
- Option to tag all search results directly from the search results page (without having to use the document tray function).
- Improved detection of email dates from non-native emails.
- Improved performance of the Analysis and Parties pages.
- Easier merging of document types.
- Easier editing of issues.
- Add documents to tray function now shows more detail about matched documents.
- Improved performance when searching multiple lists.
The LawFlow team will be maintaining full support and services during the Level 4 COVID-19 restrictions. We have full work-from-home capabilities to continue on a “business as usual” basis.
Please continue to send support queries to email@example.com and contact us as usual during this time.
We wish our customers all the best during these challenging times.
The LawFlow Team
We are still working very hard on exciting new features, in the meantime here is a summary of new features and improvements since the last update:
- Addition of a new list export mode: “Native format with sensitive document exceptions“. This mode exports list documents in native format except if the native document has one or more “sensitive” attachments (being a visible or hidden attachment that is set as privileged, confidential, or redacted, or unreviewed for privilege or confidentiality and therefore potentially sensitive). For those documents, they are exported in PDF format instead.
- List bundles: option to skip the generation of a placeholder PDF for native file exceptions in a PDF bundle generation.
- Option to limit by repository when adding documents to the tray by production number. This makes it easier when the same production number has been used in multiple repositories.
- Option to ignore mismatched documents when adding documents to the tray by ID.
- Print option added to document preview pane.
- Barcode scanning: you can now use a special separator sheet for use when the next document is an attachment of the previous top-level document. This allows a bulk-scanned PDF to be automatically split into separate top-level documents and attachments.
- Category for browsing privileged documents that have no privilege category.
- Scanned PDF detection: ability to detect in some cases whether a PDF is a scanned document, as opposed to a text-based PDF.
- Vector-based PDF detection: ability to detect in some cases whether a PDF includes vector content, as opposed to text or image-based content.
- Faster extraction of PST email archives.
- Better performance of apparent duplicate detection.
- Better performance of zipping document bundles.
- Some improvements to descriptions in statistics reports.
- Warning before displaying very large text and XML files.
- Better handling of extremely large history exports to Excel (now spans across multiple tabs).
- Better detection of poor-quality barcode separator pages in bulk scans.
- Better performance for bulk-linking a large number of documents to an issue.
- Various general performance improvements.
Thanks to all of our great users for their support and feedback.
This update includes some useful new functionality in the search tool, and significant performance improvements particularly for large projects.
- New search options:
- You can now search documents by party role (i.e. author and/or recipient), specify multiple authors/recipients/both on an “all” or “any” basis, and specify separate “must include” and “must exclude” criteria. This makes it easier to carry out searches such as “all documents authored by A or B, and received by X and Y“.
- Added the ability to search for multiple tags on an “all” or “any” basis.
- There are now separate options for searching documents explicitly marked as “undated”, or documents that have not had any date information set.
- Added the ability to search documents by discovery workflow status – partially reviewed documents and fully reviewed documents.
- Support for digitally signed & encrypted emails (P7M/P7S S/MIME).
- Automatic extraction of Outlook OST cache files (not to be confused with regular Outlook PST email archives, which LawFlow has always been able to automatically extract). LawFlow could previously extract Outlook OST files with helpdesk support. This update includes automatic processing of uploaded OST files.
- Ability to view workflow status information for tray documents.
- Significant performance improvements for very large projects, in particular browsing & viewing documents and generating long discovery lists.
- Added ability to easily view hidden attachments from a parent document.
- Improved layout of search options.
- More search tips.
- More robust process for expiring (i.e. refreshing) browser-cached versions of live-preview list document PDFs.
- Date range searching now supports partial dates (e.g. 0/11/2019 instead of a specific date in November 2019).