yayi C++, python, image processing, hacking, etc

Transferring large file to/from Windows host with Ansible

Introduction

As Ansible users may already know, the connection between Ansible and a Windows host is most of the time done via WinRM. With this type of connection, it is already possible to transfer files that are not too large via the Ansible win_copy module (~30MB seems to be ok). However, this method does not work if you want to transfer an archive of eg. 3GB to the remote (for Ansible <= 2.8 at least).

In this post, I will present a method I developed recently for transferring large files to Windows host from Ansible.

General idea

The general idea for performing the transfer is to create dynamically a Samba share end-point on the host, and mount this end-point from the Ansible machine. Once this connection established, transferring the file is just a matter of copying locally from one folder to another.

There are little technicalities there and there that we will explain.

Implementation details

We suppose that the variable destination_folder contains the folder on the remote to which the "big" file is being copied. We also suppose that the inventory is complete, and that the Ansible command is using the --become switch.

We start by creating the Samba end-point, like this:

- name: Add share on the remote
  win_share:
    name: ansible-work-dir
    description: for pushing ansible stuff
    path: "{{ destination_folder }}"
    list: yes
    full: "{{ ansible_user_id }}"

After the network folder successfully created, we mount it to the local machine. What follows is specific to Linux/Ubuntu and the gio program in particular, but can be easily adapted to eg. OSX:

  1. we create a temporary file that will contain the connection secrets. This temporary file is indeed required for being able to call the mount command without the user interaction.
  2. we populate this temporary file with the connection credentials
  3. we call the gio program locally and pipe the credential file to it. This command does not require elevated privileges on the local machine.
- name: Create temporary file
  tempfile:
    state: file
    suffix: temp
  register: thefile
  delegate_to: localhost
  become: False

- name: Creating secrets file for the share
  copy:
    content: "{{ ansible_user_id }}\n\n{{ ansible_password }}"
    dest: '{{ thefile.path }}'
  delegate_to: localhost
  become: False

- name: Mount local folder
  shell: gio mount smb://{{ inventory_hostname }}/ansible-work-dir < {{ thefile.path }}
  delegate_to: localhost
  become: False
  register: command_result
  failed_when:
    - command_result.rc != 0
    - not ('Location is already mounted' in command_result.stderr)

# removing as soon as possible
- name: removing the credential file
  file:
    path: "{{ thefile.path }}"
    state: absent
  delegate_to: localhost
  become: False

We then need to locate the mount point locally: gio creates a mount point that is visible only by the user creating the mount, and the name contains the user id.

- name: Get current user ID
  command: id -u
  delegate_to: localhost
  become: False
  register: id_cmd_output

# synchronize does not work: we want individual files
# copy does not work: it caches the file in memory, file too big
- name: copying the files to the network share
  command: >
    cp
    {{ item }}
    /run/user/{{ id_cmd_output.stdout_lines[0] }}/gvfs/'smb-share:server={{ inventory_hostname }},share=ansible-work-dir'/{{ item | basename }}
  loop:
    - local_file1
    - local_file2
  loop_control:
    label: "Copying file '{{ item }}'"
  delegate_to: localhost
  become: False

In the command above, the long and complicated copy line is explained the following way:

  • the program gpio creates automatically a mount that is visible by the current user on the operating system. This mount is in /run/user/<userID>/gvfs/<name-of-the-mount> so we need to know the userID and the name-of-the-mount

  • id_cmd_output.stdout_lines[0] contains the user ID from the previous command id -u

  • the name of the mount is composed with the type of the share, the server, and the name of the remote connection. We previously created a ansible-work-dir network share on the remote Windows host, and this is the name that appears in the gpio command

    gio mount smb://{{ inventory_hostname }}/ansible-work-dir

  • The host is also the same in the gpio command and the name of the mount. All in all, the name of the mount is

    'smb-share:server={{ inventory_hostname }},share=ansible-work-dir'

Improvements

We presented the general idea, let's now improve it and make it more user friendly.

Cleanups

The first improvement over the exposed method is to wrap all the commands inside a block command, such that cleanups are performed /always/ afterwards. The cleanups should be doing the following:

  • removing the secret file
  • unmounting locally the Samba connection
  • removing the network folder from the remote

Here is an implementation detail:

- name: transfer large files to Windows host
  block:

    # do all the previous

  always:
    - name: removing the credential file
      file:
        path: "{{ thefile.path }}"
        state: absent
      delegate_to: localhost
      become: False
      when: thefile is defined

    - name: Removing the local mount point
      shell: gio mount -f -u smb://{{ inventory_hostname }}/ansible-work-dir
      args:
        removes: >-
          /run/user/{{ id_cmd_output.stdout_lines[0] }}/gvfs/smb-share:server={{ inventory_hostname }},share=ansible-work-dir/
      when: id_cmd_output is defined
      delegate_to: localhost
      become: False

    - name: Removing the exposed windows share point
      win_share:
        name: ansible-work-dir
        state: absent

To be more robust to errors, we also make the task Get current user ID the first of the block, such that we can execute the unmounting command in the always section.

Optimizing the copies

The second improvement is about optimizing the copies, which is particularly relevant as we are copying big files and this would take time.

In order to avoid a copy, what needs to be done is the following:

  1. check if the file exists already on the remote.
  2. If not, we need to copy
  3. if yes:
    1. check if the file on the remote is the same as the one we need to copy. This check can be computed in several ways, but ultimately we apply a function to the remote and local file that needs to be copied, and compare the outputs of the function. We say the file is the same if the outputs are the same.
    2. if the file is the same, no need to perform the copy
    3. otherwise copy

This is for individual files. In addition to this and since establishing the network setup (connecting to the remote, creating the network share, mounting, etc) requires some time, we also might be interested in processing the transfer in batch of files: the network setup will get amortized.

To continue further with the previous copy checks: when we need to copy at least one single file, then the network setup needs to be created. It can be avoided if no file needs to be copied.

Doing that for a list of files is a bit of a pain in YAML and Ansible, but this is still doable.

First, we apply the function used for comparison to the remote files. In our case, the function is based on the following:

  • existence of the file on the remote, we assume the file exists on the local machine
  • size of the file

We do not compute the hashes/checksum on purpose as it may take some time on big files, but this is definitely a good idea to do so for a much more precise and safer copy. The above two criteria for comparison can be performed in one single call to the win_stat module:

- name: Computing the comparison function for the remote files
  win_stat:
    path: "{{ destination_folder }}\\{{ item | basename }}"
    get_checksum: no # very basic, we just check if the file exists
  register: function_file_remote
  loop: "{{ list_of_big_files }}"
  loop_control:
    label: "Computing comparison for remote file '{{ item | basename }}'"

The variable function_file_remote will receive the results of the comparison function applied to the list of files.

We do the same for the local copies of the file that need to be transferred:

- name: Computing the comparison function for the local files
  stat:
    path: "{{ item }}"
    get_checksum: no
  register: function_file_local
  loop: "{{ list_of_big_files }}"
  loop_control:
    label: "Computing comparison for local file '{{ item | basename }}'"
  delegate_to: localhost
  become: False

Since we need to create the network setup between the remote and the local machine, we create a single condition that aggregates the comparison for all the outputs of the comparison function.

- name: Creating condition for transfer
  set_fact:
    should_transfer: >
      {{ ( should_transfer | default([]) ) +
         [ (not item.1.stat.exists)
           or (item.1.stat.size != item.2.stat.size) ]
         }}
  loop: "{{ list_of_big_files | zip(function_file_remote.results, function_file_local.results) | list }}"
  loop_control:
    label: "Creating conditions for mounting from '{{ item.0 | basename }}'"

Here the variable should_transfer is a list of each individual comparison test. More complicated comparisons can be performed, such as

  • comparing hashes computed from the full content of the files
  • comparing hashes computed from say a small chunk at the beginning and end of the files
  • or more exotic functions

Once the list should_transfer has been constructed, the previous block can be modified as follow:

- name: transfer large files to Windows host
  block:
    # create network setup
    # copy each file

  when: should_transfer | max

  always:
    # perform the cleanups

where should_transfer | max evaluates to True if any element is True, and False otherwise. The result is equivalent to True in should_transfer.

Finally the copy command will be executed only when needed, using the previous comparison results:

- name: copying the files to the network share
  command: >
    cp
    {{ item.0 }}
    /run/user/{{ id_cmd_output.stdout_lines[0] }}/gvfs/'smb-share:server={{ inventory_hostname }},share=ansible-work-dir'/{{ item.0 | basename }}
  loop: "{{ list_of_big_files | zip(should_transfer) | list }}"
  when: item.1
  loop_control:
    label: "Copying file '{{ item.0 }}'"
  delegate_to: localhost
  become: False

Conclusion

We presented a method for transferring large files to a Windows target machine. The method is simple and does not require any external share or drive. The method presented used a tool available on Ubuntu, but can be easily extended to other operating systems.

Some assumptions where made:

  • Ubuntu as mentioned several times
  • The files are all transferred to the same folder on the remote. This is a strong limitation for certain use cases. Worst case, it would be possible to mount a remote partition partition instead, like \\targetmachine\C$, but I personally dislike this.

Revisions

  • 2019-06-06: initial version
  • 2019-06-10: minor edits on the Ansible commands, additional links.
  • 2020-03-01: minor edits (ansible_host -> inventory_hostname, learning >- instead of >) for newer versions of Ansible, and fixing the removal of the local mount point.