datalad_next.shell
A persistent shell connection
This module provides a context manager that establishes a connection to a shell
and can be used to execute multiple commands in that shell. Shells are usually
remote shells, e.g. connected via an ssh
-client, but local shells like
zsh
, bash
or PowerShell
can also be used.
The context manager returns an instance of ShellCommandExecutor
that
can be used to execute commands in the shell via the method
ShellCommandExecutor.__call__()
. The method will return an instance of
a subclass of ShellCommandResponseGenerator
that can be used to
retrieve the output of the command, the result code of the command, and the
stderr-output of the command.
Every response generator expects a certain output structure. It is responsible
for ensuring that the output structure is generated. To this end every
response generator provides a method
ShellCommandResponseGenerator.get_command_list()
. The method
ShellCommandExecutor.__call__
will pass the user-provided command to
ShellCommandResponseGenerator.get_command_list()
and receive a list of
final commands that should be executed in the connected shell and that will
generate the expected output structure. Instances of
ShellCommandResponseGenerator
have therefore four tasks:
Create a final command list that is used to execute the user provided command. This could, for example, execute the command, print an end marker, and print the return code of the command.
Parse the output of the command, yield it to the user.
Read the return code and provide it to the user.
Provide stderr-output to the user.
A very versatile example of a response generator is the class
VariableLengthResponseGenerator
. It can be used to execute a command
that will result in an output of unknown length, e.g. ls
, and will yield
the output of the command to the user. It does that by using a random
end marker to detect the end of the output and read the trailing return code.
This is suitable for almost all commands.
If VariableLengthResponseGenerator
is so versatile, why not just
implement its functionality in ShellCommandExecutor
? There are two
major reasons for that:
Although the
VariableLengthResponseGenerator
is very versatile, it is not the most efficient implementation for commands that produce large amounts of output. In addition, there is also a minimal risk that the end marker is part of the output of the command, which would trip up the response generator. Putting response generation into a separate class allows to implement specific operations more efficiently and more safely. For example,DownloadResponseGenerator
implements the download of files. It takes a remote file name as user "command" and creates a final command list that emits the length of the file, a newline, the file content, a return code, and a newline. This allowsDownloadResponseGenerator
to parse the output without relying on an end marker, thus increasing efficiency and safetyFactoring out the response generation creates an interface that can be used to support the syntax of different shells and the difference in command names and options in different operating systems. For example, the response generator class
VariableLengthResponseGeneratorPowerShell
supports the invocation of commands with variable length output in aPowerShell
.
In short, parser generator classes encapsulate details of shell-syntax and
operation implementation. That allows support of different shell syntax, and
the efficient implementation of specific higher level operations, e.g.
download
. It also allows users to extend the functionality of
ShellCommandExecutor
by providing their own response generator
classes.
The module datalad_next.shell.response_generators
provides two generally
applicable abstract response generator classes:
The functionality of the former is described above. The latter can be used to
execute a command that will result in output of known
length, e.g. echo -n 012345
. It reads the specified number of bytes and a
trailing return code. This is more performant than the variable length response
generator (because it does not have to search for the end marker). In addition,
it does not rely on the uniqueness of the end marker. It is most useful for
operation like download
, where the length of the output can be known in
advance.
As mentioned above, the classes VariableLengthResponseGenerator
and
FixedLengthResponseGenerator
are abstract. The module
datalad_next.shell.response_generators
provides the following concrete
implementations for them:
When datalad_next.shell.shell()
is executed it will use a
VariableLengthResponseClass
to skip the login message of the shell.
This is done by executing a zero command (a command that will possibly
generate some output, and successfully return) in the shell. The zero command is
provided by the concrete implementation of class
VariableLengthResponseGenerator
. For example, the zero command for
POSIX shells is test 0 -eq 0
, for PowerShell it is Write-Host hello
.
Because there is no way for func:shell to determine the kind of shell it
connects to, the user can provide an alternative response generator class, in
the zero_command_rg_class
-parameter. Instance of that class
will then be used to execute the zero command. Currently, the following two
response generator classes are available:
VariableLengthResponseGeneratorPosix
: works with POSIX-compliant shells, e.g.sh
orbash
. This is the default.
VariableLengthResponseGeneratorPowerShell
: works with PowerShell.
Whenever a command is executed via ShellCommandExecutor.__call__()
, the
class identified by zero_command_rg_class
will be used by default to create
the final command list and to parse the result. Users can override this on a
per-call basis by providing a different response generator class in the
response_generator
-parameter of ShellCommandExecutor.__call__()
.
Examples
See the documentation of datalad_next.shell.shell()
for examples of how to
use the shell-function and different response generator classes.
API overview
|
Execute a command in a shell and return a generator that yields output |
|
An abstract class the specifies the minimal functionality of a response generator |
|
Response generator that handles outputs of unknown length |
A variable length response generator for POSIX shells |
|
A variable length response generator for PowerShell shells |
|
|
Response generator for efficient handling of outputs of known length |
|
|
|
Response generator interface for efficient download |
|
A response generator for efficient download commands from Linux systems |
|
Upload a local file to a named file in the connected shell |
|
Download a file from the connected shell |
|
Delete files on the connected shell |
- datalad_next.shell.shell(shell_cmd: list[str], *, credential: str | None = None, chunk_size: int = 65536, zero_command_rg_class: type[~datalad_next.shell.response_generators.VariableLengthResponseGenerator] = <class 'datalad_next.shell.response_generators.VariableLengthResponseGeneratorPosix'>) Generator[ShellCommandExecutor, None, None] [source]
Context manager that provides an interactive connection to a shell
This context manager uses the provided argument
shell_cmd
to start a shell-subprocess. Usually the commands provided inshell_cmd
will start a client for a remote shell, e.g.ssh
.shell()
returns an instance ofShellCommandExecutor
in theas
-variable. This instance can be used to interact with the shell. That means, it can be used to execute commands in the shell, receive the data that the commands write to theirstdout
andstderr
, and retrieve the return code of the executed commands. All commands that are executed via the returned instance ofShellCommandExecutor
are executed in the same shell instance.- Parameters:
shell_cmd (list[str]) -- The command to execute the shell. It should be a list of strings that is given to
iter_subproc()
as args-parameter. For example:['ssh', '-p', '2222', 'localhost']
.chunk_size (int, optional) -- The size of the chunks that are read from the shell's
stdout
andstderr
. This also defines the size of storedstderr
-content.zero_command_rg_class (type[VariableLengthResponseGenerator], optional, default: 'VariableLengthResponseGeneratorPosix') --
Shell uses an instance of the specified response generator class to execute the zero command ("zero command" is the command used to skip the login messages of the shell). This class will also be used as the default response generator for all further commands executed in the
ShellCommandExecutor
-instances that is returned byshell()
. Currently, the following concrete subclasses ofVariableLengthResponseGenerator
exist:VariableLengthResponseGeneratorPosix
: compatible with POSIX-compliant shells, e.g.sh
orbash
.VariableLengthResponseGeneratorPowerShell
: compatible with PowerShell.
- Yields:
Examples
Example 1: a simple example that invokes a single command, prints its output and its return code:
>>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... result = ssh(b'ls -l /etc/passwd') ... print(result.stdout) ... print(result.returncode) ... b'-rw-r--r-- 1 root root 2773 Nov 14 10:05 /etc/passwd\n' 0
Example 2: this example invokes two commands, the second of which exits with a non-zero return code. The error output is retrieved from
result.stderr
, which contains allstderr
data that was written since the last command was executed:>>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... print(ssh(b'head -1 /etc/passwd').stdout) ... result = ssh(b'ls /no-such-file') ... print(result.stdout) ... print(result.returncode) ... print(result.stderr) ... b'root:x:0:0:root:/root:/bin/bash\n' b'' 2 b"Pseudo-terminal will not be allocated because stdin is not a terminal.\r\nls: cannot access '/no-such-file': No such file or directory\n"
Example 3: demonstrates how to use the
check
-parameter to raise aCommandError
-exception if the return code of the command is not zero. This delegates error handling to the calling code and helps to keep the code clean:>>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... print(ssh(b'ls /no-such-file', check=True).stdout) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "/home/cristian/Develop/datalad-next/datalad_next/shell/shell.py", line 279, in __call__ return create_result( File "/home/cristian/Develop/datalad-next/datalad_next/shell/shell.py", line 349, in create_result result.to_exception(command, error_message) File "/home/cristian/Develop/datalad-next/datalad_next/shell/shell.py", line 52, in to_exception raise CommandError( datalad.runner.exception.CommandError: CommandError: 'ls /no-such-file' failed with exitcode 2 [err: 'cannot access '/no-such-file': No such file or directory']
Example 4: an example for manual checking of the return code:
>>> from datalad_next.shell import shell >>> def file_exists(file_name): ... with shell(['ssh', 'localhost']) as ssh: ... result = ssh(f'ls {file_name}') ... return result.returncode == 0 ... print(file_exists('/etc/passwd')) True >>> print(file_exists('/no-such-file')) False
Example 5: an example for result content checking:
>>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... result = ssh(f'grep root /etc/passwd', check=True).stdout ... if len(result.splitlines()) != 1: ... raise ValueError('Expected exactly one line')
Example 6: how to work with generator-based results. For long running commands a generator-based result fetching can be used. To use generator-based output the command has to be executed with the method
ShellCommandExecutor.start()
. This method returns a generator that provides command output as soon as it is available:>>> import time >>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... result_generator = ssh.start(b'c=0; while [ $c -lt 6 ]; do head -2 /etc/passwd; sleep 2; c=$(( $c + 1 )); done') ... for result in result_generator: ... print(time.time(), result) ... assert result_generator.returncode == 0 1713358098.82588 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n' 1713358100.8315682 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n' 1713358102.8402972 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n' 1713358104.8490314 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n' 1713358106.8577306 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n' 1713358108.866439 b'root:x:0:0:root:/root:/bin/bash\nsystemd-timesync:x:497:497:systemd Time Synchronization:/:/usr/sbin/nologin\n'
(The exact output of the above example might differ, depending on the length of the first two entries in the
/etc/passwd
-file.)Example 7: how to use the
stdin
-parameter to feed data to a command that is executed in the persistent shell. The methodsShellCommandExecutor.__call__()
andShellCommandExecutor.start()
allow to pass an iterable in thestdin
-argument. The content of this iterable will be sent tostdin
of the executed command:>>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... result = ssh(b'head -c 4', stdin=(b'ab', b'c', b'd')) ... print(result.stdout) b'abcd'
Example 8: how to work with commands that consume
stdin
completely. In the previous example, the commandhead -c 4
was used to consume data fromstdin
. This command terminates after reading exactly 4 bytes fromstdin
. Ifcat
was used instead ofhead -c 4
, the command would have continued to run until itsstdin
was closed. Thestdin
of the command that is executed in the persistent shell can be close by callingssh.close()
. But, in order to be able to callssh.close()
, any process that consumesstdin
completely should be executed by calling thessh.start()
-method. The reason for this is thatssh.start()
will return immediately which allows to call thessh.close()
-method, as shown in the following code (ssh.__call__()
would have waited forcat
to terminate, but becausessh.close()
is not called,cat
would never terminate):>>> from datalad_next.shell import shell >>> with shell(['ssh', 'localhost']) as ssh: ... result_generator = ssh.start(b'cat', stdin=(b'12', b'34', b'56')) ... ssh.close() ... print(tuple(result_generator)) (b'123456',)
Note that the
ssh
-object cannot be used for further command execution afterssh.close()
was called. Further command execution requires to spin up a new persistent shell-object. To prevent this overhead, it is advised to limit the number of bytes that a shell-command consumes, either by their number, e.g. by usinghead -c
, or by some other means, e.g. by interpreting the content or using a command liketimeout
.Example 9: upload a file to the persistent shell. The command
head -c
can be used to implement the upload a file to a remote shell. The basic idea is to determine the number of bytes that will be uploaded and create a command in the remote shell that will consume exactly this amount of bytes. The following code implements this idea (without file-name escaping and error handling):>>> import os >>> import time >>> from datalad_next.shell import shell >>> def upload(ssh, file_name, remote_file_name): ... size = os.stat(file_name).st_size ... f = open(file_name, 'rb') ... return ssh(f'head -c {size} > {remote_file_name}', stdin=iter(f.read, b'')) ... >>> with shell(['ssh', 'localhost']) as ssh: ... upload(ssh, '/etc/passwd', '/tmp/uploaded-1')
Note: in this example,
f
is not explicitly closed, it is only closed when the program exits. The reason for this is that the shell uses threads internally for stdin-feeding, and there is no simple way to determine whether the thread that readsf
has yet read an EOF and exited. Iff
is closed before the thread exits, and the thread tries to read fromf
, aValueError
will be raised (the functiondatalad_next.shell.posix.upload()
contains a solution for this problem that has slightly more code. For the sake of simplicity, this solution was not implemented in the example above).Example 10: download a file. This example uses a fixed-length response generator to download a file from a remote shell. The basic idea is to determine the number of bytes that will be downloaded and create a fixed-length response generator that reads exactly this number of bytes. The fixed length response generator is then passed to
ssh.start()
in the keyword-argumentresponse_generator
. This instructsssh.start()
to use the response generator to interpret the output of this command invocation (the example code has no file-name escaping or error handling):>>> from datalad_next.shell import shell >>> from datalad_next.shell.response_generators import FixedLengthResponseGeneratorPosix >>> def download(ssh, remote_file_name, local_file_name): ... size = ssh(f'stat -c %s {remote_file_name}').stdout ... with open(local_file_name, 'wb') as f: ... response_generator = FixedLengthResponseGeneratorPosix(ssh.stdout, int(size)) ... results = ssh.start(f'cat {remote_file_name}', response_generator=response_generator) ... for chunk in results: ... f.write(chunk) ... >>> with shell(['ssh', 'localhost']) as ssh: ... download(ssh, '/etc/passwd', '/tmp/downloaded-1') ...
Note that
ssh.start()
is used to start the download. This allows to process downloaded data as soon as it is available.Example 11: This example implements interaction with a Python interpreter (which can be local or remote). Interaction in the context of this example means, executing a line of python code, returning the result, i.e. the output on
stdout
, and detect whether an exception was raised or not. To this end a Python-specific variable-length response generator is created by subclassing the generic classVariableLengthResponseGenerator
. The new response generator implements the methodget_final_command()
, which takes a python statement and returns atry
-except
-block that executes the python statement, prints the end-marker and a return code (which is0
if the statement was executed successfully, and1
if an exception was raised):>>> from datalad_next.shell import shell >>> from datalad_next.shell.response_generators import VariableLengthResponseGenerator >>> class PythonResponseGenerator(VariableLengthResponseGenerator): ... def get_final_command(self, command: bytes) -> bytes: ... return f'''try: ... {command.decode()} ... print('{self.end_marker.decode()}') ... print(0) ... except: ... print('{self.end_marker.decode()}') ... print(1) ... '''.encode() ... @property ... def zero_command(self) -> bytes: ... return b'True' ... >>> with shell(['python', '-u', '-i']) as py: ... print(py('1 + 1')) ... print(py('1 / 0')) ... ExecutionResult(stdout=b'2\n', stderr=b'>>> ... ... ... ... ... ... ... ... ', returncode=0) ExecutionResult(stdout=b'', stderr=b'... ... ... ... ... ... ... ... Traceback (most recent call last):\n File "<stdin>", line 2, in <module>\nZeroDivisionError: division by zero', returncode=1)
The python response generator could be extended to deliver exception information in an extended
ExecutionResult
. This can be achieved by pickling (see thepickle
-module) a caught exception to a byte-string, printing this byte-string after the return-code line, and printing another end-marker. Thesend()
-method of the response generator must then be overwritten to unpickle the exception information and store it in an extendedExecutionResult
(or raise it in the shell-context, if that is preferred).Example 12: this example shows how to use the shell context handler in situations were a
with
-statement is not suitable, e.g. if a shell object should be used in multiple, independently called functions. In this case the context manager can be manually entered and exited. The following code generates a globalShellCommandExecutor
-instance in thessh
-variable:>>> from datalad_next.shell import shell >>> context_manager = shell(['ssh', 'localhost']) >>> ssh = context_manager.__enter__() >>> print(ssh(b'ls /etc/passwd').stdout) b'/etc/passwd\n' >>> context_manager.__exit__(None, None, None) False