Interface BinaryUpload


  • @ProviderType
    public interface BinaryUpload
    Describes uploading a binary through HTTP requests in a single or multiple parts. This will be returned by JackrabbitValueFactory.initiateBinaryUpload(long, int). A high-level overview of the process can be found in JackrabbitValueFactory.

    Note that although the API allows URI schemes other than "http(s)", the upload functionality is currently only defined for HTTP.

    A caller usually needs to pass the information provided by this interface to a remote client that is in possession of the actual binary, who then has to upload the binary using HTTP according to the logic described below. A remote client is expected to support multi-part uploads as per the logic described below, in case multiple URIs are returned.

    Once a remote client finishes uploading the binary data, the application must be notified and must then call JackrabbitValueFactory.completeBinaryUpload(String) to complete the upload. This completion requires the exact upload token obtained from getUploadToken().

    Upload algorithm

    A remote client will have to follow this algorithm to upload a binary based on the information provided by this interface.

    Please be aware that if the size passed to JackrabbitValueFactory.initiateBinaryUpload(long, int) was an estimation, but the actual binary is larger, there is no guarantee the upload will be possible using all getUploadURIs() and the getMaxPartSize(). In such cases, the application should restart the transaction using the correct size.

    Variables used

    • fileSize: the actual binary size (must be known at this point)
    • minPartSize: the value from getMinPartSize()
    • maxPartSize: the value from getMaxPartSize()
    • numUploadURIs: the number of entries in getUploadURIs()
    • uploadURIs: the entries in getUploadURIs()
    • partSize: the part size to be used in the upload (to be determined in the algorithm)

    Steps

    1. If (fileSize / maxPartSize) > numUploadURIs, then the client cannot proceed and will have to request a new set of URIs with the right fileSize as maxSize.
    2. Calculate the partSize and the number of URIs to use.
      The easiest way to do this is to use the maxPartSize as the value for partSize. As long as the size of the actual binary upload is less than or equal to the size passed to JackrabbitValueFactory.initiateBinaryUpload(long, int), a non-null BinaryUpload object returned from that call means you are guaranteed to be able to upload the binary successfully, using the provided uploadURIs, so long as the value you use for partSize is maxPartSize. Note that it is not required to use of all the URIs provided in uploadURIs if not all URIs are required to upload the entire binary with the selected partSize.
      However, there are some exceptions to consider:
      1. If fileSize < minPartSize, then take the first provided upload URI to upload the entire binary, with partSize = fileSize. Note that it is not required to use all of the URIs provided in uploadURIs.
      2. If fileSize / partSize == numUploadURIs, all part URIs must to be used. The partSize to use for all parts except the last would be calculated using:
        partSize = (fileSize + numUploadURIs - 1) / numUploadURIs
        It is also possible to simply use maxPartSize as the value for partSize in this case, for every part except the last.
      Optionally, a client may select a different partSize, for example if the client has more information about the conditions of the network or other information that would make a different partSize preferable. In this case a different value may be chosen, under the condition that all of the following are true:
      1. partSize >= minPartSize
      2. partSize <= maxPartSize (unless maxPartSize = -1 meaning unlimited)
      3. partSize > (fileSize / numUploadURIs)
    3. Upload: segment the binary into partSize, for each segment take the next URI from uploadURIs (strictly in order), proceed with a standard HTTP PUT for each, and for the last part use whatever segment size is left.
    4. If a segment fails during upload, retry (up to a certain timeout).
    5. After the upload has finished successfully, notify the application, for example through a complete request, passing the upload token, and the application will call JackrabbitValueFactory.completeBinaryUpload(String) with the token.
      The only timeout restrictions for calling JackrabbitValueFactory.completeBinaryUpload(String) are those imposed by the cloud blob storage service on uploaded blocks. Upload tokens themselves do not time out, which allows you to be very lenient in allowing uploads to complete, and very resilient in handling temporary network issues or other issues that might impact the uploading of one or more blocks.
      In the case that the upload cannot be finished (for example, one or more segments cannot be uploaded even after a reasonable number of retries), do not call JackrabbitValueFactory.completeBinaryUpload(String). Instead, simply restart the upload from the beginning by calling JackrabbitValueFactory.initiateBinaryUpload(long, int) when the situation preventing a successful upload has been resolved.

    Example JSON view

    A JSON representation of this interface as passed back to a remote client might look like this:
     {
         "uploadToken": "aaaa-bbbb-cccc-dddd-eeee-ffff-gggg-hhhh",
         "minPartSize": 10485760,
         "maxPartSize": 104857600,
         "uploadURIs": [
             "http://server.com/upload/1",
             "http://server.com/upload/2",
             "http://server.com/upload/3",
             "http://server.com/upload/4"
         ]
     }
     
    • Method Detail

      • getUploadURIs

        @NotNull
        @NotNull Iterable<URI> getUploadURIs()
        Returns a list of URIs that can be used for uploading binary data directly to a storage location in one or more parts.

        Remote clients must support multi-part uploading as per the upload algorithm described above. Clients are not necessarily required to use all of the URIs provided. A client may choose to use fewer, or even only one of the URIs. However, it must always ensure the part size is between getMinPartSize() and getMaxPartSize(). These can reflect strict limitations of the storage provider.

        Regardless of the number of URIs used, they must be consumed in sequence, without skipping any, and the order of parts the original binary is split into must correspond exactly with the order of URIs.

        For example, if a client wishes to upload a binary in three parts and there are five URIs returned, the client must use the first URI to upload the first part, the second URI to upload the second part, and the third URI to upload the third part. The client is not required to use the fourth and fifth URIs. However, using the second URI to upload the third part may result in either an upload failure or a corrupted upload; likewise, skipping the second URI to use subsequent URIs may result in either an upload failure or a corrupted upload.

        While the API supports multi-part uploading via multiple upload URIs, implementations are not required to support multi-part uploading. If the underlying implementation does not support multi-part uploading, a single URI will be returned regardless of the size of the data being uploaded.

        Security considerations:

        • The URIs cannot be shared with other users. They must only be returned to authenticated requests corresponding to this session user or trusted system components.
        • The URIs must not be persisted for later use and will typically be time limited.
        • The URIs will only grant access to this particular binary.
        • The client cannot infer any semantics from the URI structure and path names. It would typically include a cryptographic signature. Any change to the URIs will likely result in a failing request.
        Returns:
        Iterable of URIs that can be used for uploading directly to a storage location.
      • getMinPartSize

        long getMinPartSize()
        Return the smallest possible part size in bytes. If a consumer wants to choose a custom part size, it cannot be smaller than this value. This does not apply to the final part. This value will be equal or larger than zero.

        Note that the API offers no guarantees that using this minimal part size is possible with the number of available getUploadURIs(). This might not be the case if the binary is too large. Please refer to the upload algorithm for the correct use of this value.

        Returns:
        The smallest part size acceptable for multi-part uploads.
      • getMaxPartSize

        long getMaxPartSize()
        Return the largest possible part size in bytes. If a consumer wants to choose a custom part size, it cannot be larger than this value. If this returns -1, the maximum is unlimited.

        The API guarantees that a client can split the binary of the requested size using this maximum part size and there will be sufficient URIs available in getUploadURIs(). Please refer to the upload algorithm for the correct use of this value.

        Returns:
        The maximum part size acceptable for multi-part uploads or -1 if there is no limit.
      • getUploadToken

        @NotNull
        @NotNull String getUploadToken()
        Returns a token identifying this upload. This is required to finalize the upload at the end by calling JackrabbitValueFactory.completeBinaryUpload(String).

        The format of this string is implementation-dependent. Implementations must ensure that clients cannot guess tokens for existing binaries.

        Returns:
        A unique token identifying this upload.