OIDA receives and locates documents from a variety of sources for inclusion in the archive.
Documents have been redacted for certain personally identifying information (PII) and protected health information (PHI)—mostly before OIDA received them, though in a few cases the OIDA team arranged for redaction before making the documents public. For more information, see the Industry Documents Library’s policies. Documents that have been redacted are found only in Excel or PDF formats.
Some slidedecks (such as PowerPoint documents) are available both in PowerPoint format and in a PDF created by the OIDA team to allow users to browse the slide deck (the set of slides) through the Industry Documents Library’s web interface without downloading the original file.
Documents vary in the completeness of their metadata. The following fields, when present, will help you trace the past and current file formats of documents in OIDA:
- Filename includes the name of the file as represented in metadata in the file received by OIDA. For example, a PDF of a Word or PowerPoint document might include metadata showing the filename of the original Word or PowerPoint file.
- Filepath includes a path where this document – or a file it was derived from – was located at one point during the process of legal discovery. The filename and format mentioned here might not match what is available in OIDA.
- Originalformat includes a value based on the file extension noted by software used in legal discovery. Please note that this field is shown only in the web interface and is not included in the metadata index (accessible through the Solr API and through copies in the OIDA Toolbox)
- Type includes one or more values from a controlled vocabulary of document types in OIDA (see values for field dt in metadata documentation). These values were determined by e-discovery software, assigned manually, or derived from terms found in the Genre field. The value “Document” is used liberally, applying to scanned documents of various types and to packets of documents.
- Genre includes one or more values related either to the subject or type of document.
In order to locate documents in OIDA of a particular file format other than PDF, the surest method is to query the filename
field, download the documents, and discard any documents that are available solely in PDF format. For example, to get Excel documents, you might use filename:(*.xl*)
in order to download files ending in .xls, .xlsx, .xlm, .xlsm, etc.
In order to locate documents in OIDA that were originally of a particular type (like a spreadsheet or a slide deck) and might now be available in either the original format or only in PDF format, the best method is to query type:value
, where “value” is one of the values for the field dt
in metadata documentation.