Vadim Frolov
Vadim Frolov's Blog

Vadim Frolov's Blog

Download a Google Drive file from Matlab

Vadim Frolov's photo
Vadim Frolov
·Aug 9, 2018·

Here is a story for today. We have some public files stored on a Google Drive, which we would like to download automatically. Files that I was interested in are relatively big multi-frame Tiff files (231 Mb each).

I assume that you have file IDs. If you do not know what it is, please refer to numerous online pages. For example, lifehacker.

Having an ID one can easily construct a download link. The problem comes with big files. If you try to download a big file, Google will redirect you to a special page informing that it is not possible to scan the file for viruses. There will be a download link on that page. It won't be Google if you could just parse the response, get the new link and use it directly. No! Everything is very dynamic, new link will contain a confirmation code. But in order to work, it also needs a proper cookie. If you are on Linux machine, here is a script for you from StackOverflow. We are on whatever OS using Matlab.

Alright, Matlab has several ways to get web content. Webread is a very convenient top-level function, but you won't be able to use it because you need to preserve cookies. Matlab provides an example function to send arbitrary HTTP requests with cookie support. Here is the corrected version of that function for the reference:

function [response, retInfos, history] = sendRequest(uri, request)

% uri: matlab.net.URI
% request: matlab.net.http.RequestMessage
% response: matlab.net.http.ResponseMessage

% matlab.net.http.HTTPOptions persists across requests to reuse  previous
% Credentials in it for subsequent authentications
persistent options 

% infos is a containers.Map object where: 
%    key is uri.Host; 
%    value is "info" struct containing:
%        cookies: vector of matlab.net.http.Cookie or empty
%        uri: target matlab.net.URI if redirect, or empty
persistent infos

if isempty(options)
    options = matlab.net.http.HTTPOptions('ConnectTimeout',20);
end

if isempty(infos)
    infos = containers.Map;
end
host = string(uri.Host); % get Host from URI
try
    % get info struct for host in map
    info = infos(char(host));
    if ~isempty(info.uri)
        % If it has a uri field, it means a redirect previously
        % took place, so replace requested URI with redirect URI.
        uri = info.uri;
    end
    if ~isempty(info.cookies)
        % If it has cookies, it means we previously received cookies from this host.
        % Add Cookie header field containing all of them.
        request = request.addFields(matlab.net.http.field.CookieField(info.cookies));
    end
catch
    % no previous redirect or cookies for this host
    info = [];
end

% Send request and get response and history of transaction.
[response, ~, history] = request.send(uri, options);
if response.StatusCode ~= matlab.net.http.StatusCode.OK
    return
end

% Get the Set-Cookie header fields from response message in
% each history record and save them in the map.
arrayfun(@addCookies, history)

% If the last URI in the history is different from the URI sent in the original 
% request, then this was a redirect. Save the new target URI in the host info struct.
targetURI = history(end).URI;
if ~isequal(targetURI, uri)
    if isempty(info)
        % no previous info for this host in map, create new one
        infos(char(host)) = struct('cookies',[],'uri',targetURI);
    else
        % change URI in info for this host and put it back in map
        info.uri = targetURI;
        infos(char(host)) = info;
    end
end
retInfos = infos;

    function addCookies(record)
        % Add cookies in Response message in history record
        % to the map entry for the host to which the request was directed.
        %
        ahost = record.URI.Host; % the host the request was sent to
        cookieFields = record.Response.getFields('Set-Cookie');
        if isempty(cookieFields)
            return
        end
        cookieData = cookieFields.convert(); % get array of Set-Cookie structs
        cookies = [cookieData.Cookie]; % get array of Cookies from all structs
        try
            % If info for this host was already in the map, add its cookies to it.
            ainfo = infos(ahost);
            ainfo.cookies = [ainfo.cookies cookies];
            infos(char(ahost)) = ainfo;
        catch
            % Not yet in map, so add new info struct.
            infos(char(ahost)) = struct('cookies',cookies,'uri',[]);
        end
    end
end

Note that I also return some additional variable from the function:

  • retInfos is used to get confirmation code from a cookie.
  • history is used to obtain a direct link to a file.

Why do we need an additional direct link after we got a confirmation code? Because we want to download multi-frame tiff files. Matlab tries to be smart and downloads only a single frame by default.

Let's check out code to download the file and save it on disc:

fileName = 'file_00002_00002.tif';
fileId = '0B649boZqpYG1OEZnV21ncDVNcVk';
fileUrl = sprintf('https://drive.google.com/uc?export=download&id=%s', fileId);
request = matlab.net.http.RequestMessage();

% First request will be redirected to information page about virus scanning
% We can get a confirmation code from an associated cookie file
[~, infos] = sendRequest(matlab.net.URI(fileUrl), request);
confirmCode = '';
for j = 1:length(infos('drive.google.com').cookies)
    if ~isempty(strfind(infos('drive.google.com').cookies(j).Name, 'download'))
        confirmCode = infos('drive.google.com').cookies(j).Value;
        break;
    end
end
newUrl = strcat(fileUrl, sprintf('&confirm=%s', confirmCode));

% We now need to send another request to get the file.
% However, Matlab doesn't download the whole Tiff file, but only one frame.
[~, ~, history] = sendRequest(matlab.net.URI(newUrl), request);

% Thus we must use log information to find out a
% direct link and downalod it as a raw file
ind = arrayfun(@(x) ~isempty(strfind(x.URI.Host, 'googleusercontent')), history);
ind = find(ind, 1);

% we need the raw type in order to download the whole file and not just a single frame
options = weboptions('ContentType', 'raw');
imgData = webread(history(ind).URI.EncodedURI, options);
fid = fopen(fileName, 'wb');
fwrite(fid, imgData);
fclose(fid);

Finally, we got the whole file saved in the location pointed by fileName. Note there are no error checks in the code!

Here is a Gist with the same code.

 
Share this