Transferring large file to/from Windows host with Ansible
Thu 06 June 2019Introduction
As Ansible users may already know, the connection between Ansible and a Windows host is most of the time done via WinRM
. With this type of connection, it is already possible to transfer files that are not too large via the Ansible win_copy
module (~30MB seems to be ok). However, this method does not work if you want to transfer an archive of eg. 3GB to the remote (for Ansible <= 2.8
at least).
In this post, I will present a method I developed recently for transferring large files to Windows host from Ansible.
General idea
The general idea for performing the transfer is to create dynamically a Samba share end-point on the host, and mount this end-point from the Ansible machine. Once this connection established, transferring the file is just a matter of copying locally from one folder to another.
There are little technicalities there and there that we will explain.
Implementation details
We suppose that the variable destination_folder
contains the folder on the remote to which the "big" file is being copied. We also suppose that the inventory is complete, and that the Ansible command is using the --become
switch.
We start by creating the Samba end-point, like this:
- name: Add share on the remote
win_share:
name: ansible-work-dir
description: for pushing ansible stuff
path: "{{ destination_folder }}"
list: yes
full: "{{ ansible_user_id }}"
After the network folder successfully created, we mount it to the local machine. What follows is specific to Linux/Ubuntu and the gio
program in particular, but can be easily adapted to eg. OSX:
- we create a temporary file that will contain the connection secrets. This temporary file is indeed required for being able to call the mount command without the user interaction.
- we populate this temporary file with the connection credentials
- we call the
gio
program locally and pipe the credential file to it. This command does not require elevated privileges on the local machine.
- name: Create temporary file
tempfile:
state: file
suffix: temp
register: thefile
delegate_to: localhost
become: False
- name: Creating secrets file for the share
copy:
content: "{{ ansible_user_id }}\n\n{{ ansible_password }}"
dest: '{{ thefile.path }}'
delegate_to: localhost
become: False
- name: Mount local folder
shell: gio mount smb://{{ inventory_hostname }}/ansible-work-dir < {{ thefile.path }}
delegate_to: localhost
become: False
register: command_result
failed_when:
- command_result.rc != 0
- not ('Location is already mounted' in command_result.stderr)
# removing as soon as possible
- name: removing the credential file
file:
path: "{{ thefile.path }}"
state: absent
delegate_to: localhost
become: False
We then need to locate the mount point locally: gio
creates a mount point that is visible only by the user creating the mount, and the name contains the user id
.
- name: Get current user ID
command: id -u
delegate_to: localhost
become: False
register: id_cmd_output
# synchronize does not work: we want individual files
# copy does not work: it caches the file in memory, file too big
- name: copying the files to the network share
command: >
cp
{{ item }}
/run/user/{{ id_cmd_output.stdout_lines[0] }}/gvfs/'smb-share:server={{ inventory_hostname }},share=ansible-work-dir'/{{ item | basename }}
loop:
- local_file1
- local_file2
loop_control:
label: "Copying file '{{ item }}'"
delegate_to: localhost
become: False
In the command above, the long and complicated copy line is explained the following way:
the program
gpio
creates automatically a mount that is visible by the current user on the operating system. This mount is in/run/user/<userID>/gvfs/<name-of-the-mount>
so we need to know theuserID
and thename-of-the-mount
id_cmd_output.stdout_lines[0]
contains theuser ID
from the previous commandid -u
the name of the mount is composed with the type of the share, the server, and the name of the remote connection. We previously created a
ansible-work-dir
network share on the remote Windows host, and this is the name that appears in thegpio
commandgio mount smb://{{ inventory_hostname }}/ansible-work-dir
The host is also the same in the
gpio
command and the name of the mount. All in all, the name of the mount is'smb-share:server={{ inventory_hostname }},share=ansible-work-dir'
Improvements
We presented the general idea, let's now improve it and make it more user friendly.
Cleanups
The first improvement over the exposed method is to wrap all the commands inside a block
command, such that cleanups are performed /always/ afterwards. The cleanups should be doing the following:
- removing the secret file
- unmounting locally the Samba connection
- removing the network folder from the remote
Here is an implementation detail:
- name: transfer large files to Windows host
block:
# do all the previous
always:
- name: removing the credential file
file:
path: "{{ thefile.path }}"
state: absent
delegate_to: localhost
become: False
when: thefile is defined
- name: Removing the local mount point
shell: gio mount -f -u smb://{{ inventory_hostname }}/ansible-work-dir
args:
removes: >-
/run/user/{{ id_cmd_output.stdout_lines[0] }}/gvfs/smb-share:server={{ inventory_hostname }},share=ansible-work-dir/
when: id_cmd_output is defined
delegate_to: localhost
become: False
- name: Removing the exposed windows share point
win_share:
name: ansible-work-dir
state: absent
To be more robust to errors, we also make the task Get current user ID
the first of the block
, such that we can execute the unmounting command in the always
section.
Optimizing the copies
The second improvement is about optimizing the copies, which is particularly relevant as we are copying big files and this would take time.
In order to avoid a copy, what needs to be done is the following:
- check if the file exists already on the remote.
- If not, we need to copy
- if yes:
- check if the file on the remote is the same as the one we need to copy. This check can be computed in several ways, but ultimately we apply a function to the remote and local file that needs to be copied, and compare the outputs of the function. We say the file is the same if the outputs are the same.
- if the file is the same, no need to perform the copy
- otherwise copy
This is for individual files. In addition to this and since establishing the network setup (connecting to the remote, creating the network share, mounting, etc) requires some time, we also might be interested in processing the transfer in batch of files: the network setup will get amortized.
To continue further with the previous copy checks: when we need to copy at least one single file, then the network setup needs to be created. It can be avoided if no file needs to be copied.
Doing that for a list of files is a bit of a pain in YAML and Ansible, but this is still doable.
First, we apply the function used for comparison to the remote files. In our case, the function is based on the following:
- existence of the file on the remote, we assume the file exists on the local machine
- size of the file
We do not compute the hashes/checksum on purpose as it may take some time on big files, but this is definitely a good idea to do so for a much more precise and safer copy. The above two criteria for comparison can be performed in one single call to the win_stat
module:
- name: Computing the comparison function for the remote files
win_stat:
path: "{{ destination_folder }}\\{{ item | basename }}"
get_checksum: no # very basic, we just check if the file exists
register: function_file_remote
loop: "{{ list_of_big_files }}"
loop_control:
label: "Computing comparison for remote file '{{ item | basename }}'"
The variable function_file_remote
will receive the results of the comparison function applied to the list of files.
We do the same for the local copies of the file that need to be transferred:
- name: Computing the comparison function for the local files
stat:
path: "{{ item }}"
get_checksum: no
register: function_file_local
loop: "{{ list_of_big_files }}"
loop_control:
label: "Computing comparison for local file '{{ item | basename }}'"
delegate_to: localhost
become: False
Since we need to create the network setup between the remote and the local machine, we create a single condition that aggregates the comparison for all the outputs of the comparison function.
- name: Creating condition for transfer
set_fact:
should_transfer: >
{{ ( should_transfer | default([]) ) +
[ (not item.1.stat.exists)
or (item.1.stat.size != item.2.stat.size) ]
}}
loop: "{{ list_of_big_files | zip(function_file_remote.results, function_file_local.results) | list }}"
loop_control:
label: "Creating conditions for mounting from '{{ item.0 | basename }}'"
Here the variable should_transfer
is a list of each individual comparison test. More complicated comparisons can be performed, such as
- comparing hashes computed from the full content of the files
- comparing hashes computed from say a small chunk at the beginning and end of the files
- or more exotic functions
Once the list should_transfer
has been constructed, the previous block
can be modified as follow:
- name: transfer large files to Windows host
block:
# create network setup
# copy each file
when: should_transfer | max
always:
# perform the cleanups
where should_transfer | max
evaluates to True
if any element is True
, and False
otherwise. The result is equivalent to True in should_transfer
.
Finally the copy command will be executed only when needed, using the previous comparison results:
- name: copying the files to the network share
command: >
cp
{{ item.0 }}
/run/user/{{ id_cmd_output.stdout_lines[0] }}/gvfs/'smb-share:server={{ inventory_hostname }},share=ansible-work-dir'/{{ item.0 | basename }}
loop: "{{ list_of_big_files | zip(should_transfer) | list }}"
when: item.1
loop_control:
label: "Copying file '{{ item.0 }}'"
delegate_to: localhost
become: False
Conclusion
We presented a method for transferring large files to a Windows target machine. The method is simple and does not require any external share or drive. The method presented used a tool available on Ubuntu, but can be easily extended to other operating systems.
Some assumptions where made:
- Ubuntu as mentioned several times
- The files are all transferred to the same folder on the remote. This is a strong limitation for certain use cases. Worst case, it would be possible to mount a remote partition partition instead, like
\\targetmachine\C$
, but I personally dislike this.
Revisions
- 2019-06-06: initial version
- 2019-06-10: minor edits on the Ansible commands, additional links.
- 2020-03-01: minor edits (
ansible_host
->inventory_hostname
, learning>-
instead of>
) for newer versions of Ansible, and fixing the removal of the local mount point.