How to Import and Store Documents

→ "Try it out - API Component "YADB""
→ check GitHub Python calls | Java Script calls | Java calls | Postman collection


This "How to" describes how to import and store documents (objects) using yuuvis® Ultimate.

Importing and Storing a Single Document

Import using multipart requests

To import and store a document (object), you must have at least the metadata ready - the content file itself is optional unless you have modified the schema (per document type you define whether a content file is required, optional or not allowed - "Document Object Type Definitions", contentStreamAllowed).

Format for the metadata file:

metaData.json
{
    "objects": [{
        "properties": {
            "system:objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import"
            }
        },
        "contentStreams": [{
            "cid": "cid_63apple"
        }]
    }]
}

In this example, the schema contains an object type document with the Name property, which may or must have content.
The content is referenced in the contentStreams object by specifying a cid (multipart content ID). In the example, the cid references a multipart content with content ID cid_63apple.

A content file can be in different file formats. We recommend to specify the format correctly in the metadata and in the multipart request. If the content type is not specified, it is automatically determined during the content analysis. If the content type determination is not clear or the content analysis is switched off, the content type application/octet-stream is used.

In the example we have chosen a text file (content-Type: text/plain).

Request

To import and store a document (object) in the system, you send a POST request to the URL /dms-core/objects with a multipart body consisting of metadata and, if applicable, a content file to be stored ("POST store one or more documents" endpoint). To construct such a request, use a MultipartBody.Builder(), which allows you to build the request body from several FORM parts as follows.

Building the Multipart Body with OkHttp3
RequestBody requestBody = new MultipartBody.Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("data", "metaData.json",
			RequestBody.create(MediaType.parse("application/json; charset=utf-8"), 
				new File("./src/main/resources/metaData.json")))
        .addFormDataPart("cid_63apple", "test.txt",
        	RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
               	new File("./src/main/resources/test.txt")))
        .build();

Use a Request.Builder() to create a request object with the multipart body, headers, and the URL.
The header Ocp-Apim-Subscription-Key is necessary because it contains the user information to access the endpoint.

Building a POST Request for an Import
Request request = new Request.Builder()
        .header("Ocp-Apim-Subscription-Key", key)
        .url(baseUrl + "/dms-core/objects")
        .post(requestBody)
        .build();

Response

To display the response of yuuvis® Ultimate to the console, create an associated response object when the request is executed. Please note that an IOException can be thrown by the OkHttpClient when creating the response object.

Handling any IOException
try{	
	Response response = client.newCall(request).execute();
	System.out.println(response.body().string());	//print to console
} catch (IOException e) {
	e.printStackTrace();
}
Status Code Meaning
200 OK
401 Unauthorized
404 Not Found
422 Invalid Metadata

Importing and Storing Multiple Documents in Batch Mode

If you would like to import and store multiple documents (objects) at the same time, you can use the same endpoint: "POST store one or more documents".
Instead of a single object, the objects list consists of several metadata records. The individual content files of the objects then each require a unique cid as the name of the FormDataParts in the multipart request. This cid is referenced in the associated metadata record in the contentStreams list, which allows metadata to be uniquely assigned to content.

metaDataBatch.json
{
    "objects": [{
        "properties": {
            "system:objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import object 1"
            }
        },
        "contentStreams": [{
            "cid": "cid_63apple"
        }]
    },
    {
      "properties": {
            "system:objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import object 2"
            }
        },
        "contentStreams": [{
            "cid": "cid_64apple"
        }]
    }]
}

Request

In the multipart body, you create a separate FormDataPart for the content of each object, whose first parameter is the content ID (cid).

Building a POST Request for a Batch Import
RequestBody batchImportRequestBody = new MultipartBody
        .Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("data",
        	"metaDataBatch.json",
           	RequestBody.create(MediaType.parse("application/json; charset=utf-8"),
				new File("./src/main/resources/metaDataBatch.json")))
        .addFormDataPart("cid_63apple",
        	"test1.txt",
           	RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
				new File("./src/main/resources/test1.txt")))
        .addFormDataPart("cid_64apple",
			"test2.txt",
			RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
				new File("./src/main/resources/test2.txt")))
  		.build();

The assembly of the request object is identical to the normal import.

Response

If successful, the response object contains a multi-element objects list that contains the metadata records of all documents (objects) imported in this batch import.

Importing Compound Documents

For importing compound documents please refer to "Compound Documents". It describes what compound documents are and what to consider when importing compound documents.

Content Digest - Generation and Validation

For already imported documents, using the "Store one or more documents (POST)" endpoint, a content digest is automatically generated and stored ("Secure Hash Algorythm, SHA256")

To validate the content digest for a stored document, you use the "Validate content digest by ID" endpoint. Send a request with the objectId which generates a new content digest based on the currently stored document. This newly generated content digest is compared with the formerly generated and stored one.

To validate the content digest of a specific document (object) version, simply add a /versions/{versionNr} between the objectId and the suffix beginning with /actions.

Responses

Status Code Meaning
200 OK - The value of the content digest of the specified version stored in the index data is still correct.
404 Not Found - The document (object) with this objectId and this version number can not be found.
409 Conflict - The generated content digest of the specified version does not match the value stored in the index data.

Retention - Protect Your Documents from Being Deleted or Changed

Retentions can be used to customize and ensure documents are not deleted or changed through setting proper times/dates. An object is under retention if it has the property system:rmExpirationDate and if the value of this property lies in the future. The object's life can be extended up to an infinite amount of years for the sake of your archiving needs.

Precondition

There is a secondary object type system:rmDestructionRetention. To use retentions, the schema must contain an object type that uses this secondary object type ("Schema Structure"). Objects of this type can have retentions.

Properties of system:rmDestructionRetention
        <propertyDateTimeDefinition>
            <id>system:rmExpirationDate</id>
            <propertyType>datetime</propertyType>
            <cardinality>single</cardinality>
            <required>false</required>
        </propertyDateTimeDefinition>
        <propertyDateTimeDefinition>
            <id>system:rmStartOfRetention</id>
            <propertyType>datetime</propertyType>
            <cardinality>single</cardinality>
            <required>false</required>
        </propertyDateTimeDefinition>
        <propertyDateTimeDefinition>
            <id>system:rmDestructionDate</id>
            <propertyType>datetime</propertyType>
            <cardinality>single</cardinality>
            <required>false</required>
        </propertyDateTimeDefinition>

There are two ways to set an object under retention:

  1. the object can first be imported with an expiration date or
  2. an expiration date can be added to a previously existing object using a metadata update.

Some conditions to the values of these retention properties are if an object has an expiration date, its value must come before the creation date. Also, if the expiration date is null, the start of retention and the destruction date must be null as well. Lastly, if a destruction date is set, it must either equal to or surpass the expiration date.

Effects of Retentions

If an expiration date has been set, it is not possible to change its content or delete the object before its expiration date. The other two properties have no particular uses provided by the system. Also, if an object is under retention, it is still possible to update its metadata; however, the expiration date cannot be removed or replaced by an earlier date.

An object under retention cannot be deleted or changed even by users that have a write or delete permission; retentions are stronger than write or delete permissions.