PDF to Sort-able Excel for Privilege Logs, Discovery Indexes, etc.
Over the weekend we got a call. An associate was working under the gun to get ready for trial this week, and had an extremely large privilege log (detailing 1900+ documents) as part of a court filing. It was a part of a large PDF, but it would be much more beneficial to have in Excel where he could filter and sort the various columns. Luckily he had access to the Clear Guidance Partners Helpdesk, 15 minutes after calling, he was working through the privilege log in Excel.
A frequent request we have from our client firms is for converting data to a usable format. Oftentimes in the course of litigation, you will receive a file that would be very useful in Excel, but instead it is a PDF. If you have Clear Guidance Partners in your corner, our support team can take care of it for you as part of your monthly services. If not, keep reading! (and then give us a call, and we’ll handle it next time!)
First off, you will need a full fledged PDF software for the following tasks (most of the free ones will struggle or simply cannot complete some of the steps). Need recommendations? Check out our Legal Software Guide.
Step 1: OCR the document
OCR (optical character recognition) is the process of making text in a PDF selectable and searchable. Even if the document appears to be OCR’d already when you received it, I always recommend running the OCR process one more time to make sure everything piece of text is captured.
In Acrobat: see here, step #4
In Nuance: under the Edit tab, there is a button for ‘Convert to Editable’
Step 2: Export the document to XLS
In Acrobat: Go to the file menu, then choose ‘Export to’.
In Nuance: Go to the file menu, then Save As. Under Save As Type, you can select Excel.
In Nitro: In the Convert tab, there is a ‘to Excel’ button.
Step 3: Filter, sort & search!
You should be able to apply filters and sort your table at this point.
Potential Issues with the Excel document
If that is failing, one common issue is that as part of the conversion process, there can be side effects that create merged cells. To remove these, press CTRL + A on the keyboard to select the entire Excel workbook. Then look on the home tab to see if the “merge & center” button is highlighted - this indicates there are merged cells. Simply toggle it off, then try sorting again.
Another issue is that the Excel document may be created with lots of text boxes, typically they’re blank (your PDF tool created them to correct formatting issues.) There is an easy way to remove them all. First be confident that the text boxes do not contain any data. In Excel, press F5 on the keyboard to bring up the ‘Go to’ window. Then hit the special button, and select objects. Click OK, then click OK again. At this point all the text boxes will be selected, so hit Backspace to delete them.