This is an attempt to make it easier to share, publish and download Python packages at GSI.
The lack of infrastructure means that all options are going to involve larger or smaller amounts of hackery, but we'll try to make it as simple as possible.
See
PythonBridge for examples how to use the
pyda and
pjlsa packages.
See
GeOFF for help on how to use the
cernml collection of packages.
Quickstart
Start here if you're new. This is the brief form of the instructions on how to set up a working Python environment inside the ACC network.
The following summary is intended as a reminder in case you set up your environment again. If you're doing this the first time, only read it briefly to get an overview, then work through the numbered sections that follow it.
The section "Detailed information" contains more exhaustive information and serves as a reference.
Summary
This mostly serves as a reminder for second-timers, but it can also be useful to get an overview.
- Acquire an ACO account and enable LSA access.
- Create a virtual environment and enter it.
mkdir ~/venvs
python -m venv --system-site-packages --upgrade-deps --prompt=venv ~/venvs/default
source ~/venvs/default/bin/activate
(Consider putting the last line into your ~/.bashrc file or using a convenience function.)
- Create a personal access token, create a file only usable by you:
touch ~/.netrc && chmod "u=rw,go=" ~/.netrc
and add an arbitrary username and your token: machine git.gsi.de
login gitlab
password <your token>
- Configure Pip to search the Gitlab package registry for packages:
pip config --user set global.extra-index-url "https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"
- Specify a CMW directory server for FESA access:
export CMW_DIRECTORY_CLIENT_SERVERLIST="cmwpro00a.acc.gsi.de:5021"
(Consider putting this line into your ~/.profile or ~/.bash_profile file – depending on which one already exists – or a convenience function.)
Remember that adding a line to your ~/.profile or ~/.bash_profile file doesn't do anything until you source these files or log in again; you'll still have to execute this line manually to continue.
- If necessary. run the tests in the following sections to ensure everything has worked.
1. Access the ACC Network
Access to FESA devices is only available within the ACC network. If you need to access FESA devices or the LSA database,
you will need an
ACO account. Follow the instructions on the linked page.
If you need LSA database access, Jutta Fitzek will need to send an e-mail on your behalf as well.
The machines available for SSH access are
asl751 ... asl756.acc.gsi.de.
In addition, the BEPHY department also has
asl154.acc.gsi.de for long-term data taking. However, this is not a development machine and many common software packages are missing. Furthermore, it has no Internet access, so you to transfer scripts onto it, you will have to either proxy out, or remote-copy files onto it with
scp,
rsync or
sftp.
2. Create and Enter a Virtual Environment
We will install all the Python packages not into your global Python environment, but in an isolated
virtual environment to keep dependency issues under control. A virtual environment adjusts the paths where Python looks for your packages
and nothing else. In particular, all config files that are visible outside of a
venv are also visible within.
Log into a terminal on the machine where you want to install GSI Python packages and run the following command:
python --version
Ensure that your Python version is
3.9 or 3.11. Two important packages are
only available for these versions.
Thenrun the following commands in your terminal:
mkdir ~/venvs
# (Feel free to modify the following parameters if you know what you're doing.)
python -m venv --system-site-packages --upgrade-deps --prompt=venv ~/venvs/default
# Enter the venv by adjusting your shell's environment variables.
source ~/venvs/default/bin/activate
Important! The last line makes you enter the venv. Type
deactivate to leave it again.
Every time you log out of the machine, you also leave the venv. Consider putting the
activate line into your
~/.bashrc file or a
convenience function.
3. Create an Authentication Token for Gitlab
A token is a randomly generated password generated on the server side. Unlike real passwords, it only has a limited lifetime and a restricted list of permissions. All Gitlab personal access tokens begin with the string
glpat.
Log into git.gsi.de and navigate to your
profile page on Personal Access Tokens. Make sure the box for
read_api is checked (the other boxes don't matter), create a token and copy it. Keep the window open.
Create a file called
.netrc in your home directory and ensure that
you and only you can access it:
touch ~/.netrc
chmod "u=rw,go=" ~/.netrc # Read/write access for User, no access for Group members and Others
Then edit the file so that it contains the following lines:
machine git.gsi.de
login gitlab
password <your token>
where
is replaced with the token that you copied in the previous step. If you copied the previous commands and lost your token, go back to the browser window and copy it again.
Make sure that the token string starts with the characters "glpat". If it doesn't, you probably copied the token of your RSS feed (bottom of the page) instead of the token you created (top of the page).
Run the following command in your terminal:
pip config --user set global.extra-index-url "https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"
This configures Pip to look for packages both on the
default package index and on our
Gitlab package registry.
If your machine does not have general Internet access, replace
extra-index-url with just
index-url. In this case, Pip will
only search Gitlab for packages.
With the previous steps done, the following line should download the correct packages:
pip install "pjlsa-gsipro" "pyda-rda3 ~= 0.2.0" "gymnasium < 1"
Pip produces a lot of output, but the last few lines should confirm that the following packages (plus their dependencies) have been installed:
cmmnbuild-dep-manager-2.14.0
gymnasium-0.29.1
pjlsa-0.2.18.post1
pjlsa-gsipro-1.1.2
pyda-0.2.0.post1
pyda-rda3-0.2.3
pyrda3-0.2.4
You can test if everything worked by running the following short Python script:
from pjlsa_gsipro import LSAClientGSI
lsa = LSAClientGSI ()
with lsa.java_api():
from cern.lsa.client import ContextService, ServiceLocator
cs = ServiceLocator.getService(ContextService)
patterns = list(cs.findResidentPatterns())
for pattern in patterns:
print(pattern)
Besides a ton of pointless noise, this should also ultimately print a list of all patterns currently loaded into the accelerators at GSI.
5. Specify the CMW Directory Server
In order to access FESA, you need to specify a
CMW directory server to connect to. There are three servers at GSI: one for production, one for development and one for integration testing.
To always connect to the production server, put this line in your profile file (
~/.profile or
~/.bash_profile):
export CMW_DIRECTORY_CLIENT_SERVERLIST="cmwpro00a.acc.gsi.de:5021"
If you also need to test and integration servers occasionally, have a look at
this snippet. It defines a function that lets you set the desired server without having to remember the full address:
setup-cmw pro
setup-cmw dev
setup-cmw int
You can verify that this works by receiving data from a device. (Note that you might have to change the device name to that of one that's currently running)
import pyda, pyda_rda3
rda3 = pyda_rda3.RdaProvider()
client = pyda.SimpleClient(
provider=pyda_rda3.RdaProvider(),
)
s = client.subscribe("YRT1DC3/Acquisition", context="FAIR.SELECTOR.ALL")
for a in s:
print(a.value.header.selector)
You can break out of the loop with
Ctrl-C.
If you forget to specify the CMW directory server (like I often do), you'll usually see this error message:
pyrda3._rda3_bindings.NameServiceException: CMW Directory Service getDeviceInfo() request failed for: device 'GS09DT_ML' and domain 'RDA3' --> CMW get-device-info command failed: Cannot find the host 'cmw-dir-pro1.cern.ch'; Cannot find the host 'cmw-dir-pro3.cern.ch'; Cannot find the host 'cmw-dir-pro2.cern.ch';
where
pyda, in absence of an override, falls back on its default directory servers, which are the ones from CERN, which are not accessible from GSI (for good reason).
This section contains all the more in-depth information in case problems come up or you want to know more.
List of relevant packages
This is a brief list of all the packages that you will need for one reason or another:
- pyda: A façade package that provides a uniform interface to talk to FESA devices; always needs a provider package to do the actual work
- pyda_rda3: The provider package for
pyda that adds support for the pyrda library.
- pyrda3: Python bindings to the RDA3 C++ library
- pyrbac: Package that implements Role-Based Access Control. Required by
pyda but currently not used at GSI.
- pyccda: Package that implements the Controls Configuration Data API of CERN. Required by
pyda but currently not used at GSI.
- pjlsa_gsipro/int/dev: Python bindings to the Java libraries that provide access to the LSA database at GSI. They connect to the production database, integration-testing database and the development database respectively. These have to be separate packages because each database requires different versions of its Java dependencies.
- pjlsa: Core Python package that implements the logic and API of the above three.
- cmmnbuild_dep_manager: Downloads and manages Java libraries required by
pjlsa.
- jpype: Package that connects a Python interpreter to a Java VM so that Java libraries can be called from Python.
Warning! The packages pyrda3 and pyrbac only exist for Python 3.9 and 3.11. Make sure that your Python version matches one of these, or the package index will simply (and confusingly) report that no versions could be found.
Virtual Environments
Virtual environments make it easy to install Python packages for different projects without a risk of version clashes between their dependencies. Each
venv links to a Python interpreter (e.g. the system one), but provides a completely isolated directory for package installation. This is simular to Conda, which in addition manages not only Python but also C++ dependencies.
You can create a fresh venv by running this command:
python -m venv <path/to/the/venv>
. As path, I recommend either a home directory folder like
~/venvs/ or a directory
./.venv within your Python project, dependending on your needs.
The
venv module accepts a number of arguments, here are the important ones:
-
--system-site-packages - In addition to the packages inside the venv, all packages in the global environment are accessible as well. Without this option, you start with a completely clean venv in which nothing is installed.
-
--upgrade-deps - Automatically update Pip and Setuptools when creating the venv. The default is to put in whatever exists in the global environment.
-
--prompt= - Customize the text that is shown next to your command prompt while the venv is active. The default uses the last element of the venv's path.
You can enter a venv by running the following line:
source <path/to/the/venv>/bin/activate
While inside a venv, you can exit it with the special command
deactivate, or by switching to another venv. See also
vactivate.sh for a nicer way to switch venvs.
While inside a venv, you can
pip install whatever Python packages you want. They are accessible while inside the venv, and inaccessible outside of it.
Local installs
After you have cloned a
properly packaged Python project from Gitlab to your local disk, you can always install it to your venv by running the following line:
pip install .
(where
"." as usual refers to the current directoy). Once installed, the Python package is available via
import no matter your current working directory.
The way this works is that all your Python files are copied from your project directory to the venv's library directory. This means that if you make any changes to your project files,
they will not be reflected by the installed package. You will have to install the package again to make the changes accessible.
While developing a package, you will also want to install it in
development mode. You can do this with this line:
pip install --editable . # or `pip install -e .` for short
. This symlinks your project directory in the venv directory. This means that
any changes that you make are reflected immediately. This obviously doesn't make sense outside of active development.
Git installs
If you don't plan to make any changes to a Python project, you don't need to clone it just for installation. Pip supports installing a package
straight from a remote Git server. The exact command line depends on whether you want to authenticate yourself to Gitlab via HTTPS (username+password) or SSH (private key file):
pip install git+https://git.gsi.de/<group>/<project> # HTTP (username+password)
pip install git+ssh://git@git.gsi.de/<group>/<project> # SSH (~/.ssh/id_<algo>)
Note that for public projects (i.e. visible without Gitlab login), you don't need to authenticate at all.
This works well for quick one-off installations. It does not work well for packages that other packages depend on. There is no version management, so it'll be very easy to break other people's code on accident.
Gitlab package index
Gitlab also makes it possible to run your own package index from which Pip can install packages. I've set one up
here. The way it works is that every Gitlab project has its own index (e.g.
this one) and the group-level index aggregates all packages of the projects in the group.
Authentication
Unfortunately, it's still a bit of a chore to set this up. Because Gitlab doesn't support SSH or password authentication, you will need an
authentication token to connect to the package index. That's basically a server-side-generated random password with limited permissions and lifetime attached.
Go to your
profile page on Personal Access Tokens. Make sure that the "read_api" box is checked and create a token. Copy the token string.
Once this is done, you have two ways to pass this token to Pip:
- The recommended way: Create a file
~/.netrc and write in it the following contents: machine git.gsi.de
login gitlab
password <token> where is replaced with the token you copied earlier. You should ensure that no one but you can access this file: chmod "u=rw,go=" ~/.netrc
Now, every time that Pip connects to the server git.gsi.de, it will automatically log in with these credentials.
- When specifying the index URL (see below), explicitly include the log-in credentials. This could look like this:
pip install --index-url="https://gitlab:$GITLAB_TOKEN@git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"
where $GITLAB_TOKEN is either an environment variable in which you store the token, or you just replace it directly with the token. Obviously this is unsafe because it exposes your token (which is a kind of password) to the entire GSI. (Remember that every command you type is saved in ~/.bash_history!)
Connection
There are two command-line parameters that influence where Pip looks for packages:
-
--index-url= - This replaces the default package index with the one given by the URL. This means that common packages like e.g. Numpy can no longer be found unless they're also available on the new index.
-
--extra-index-url= - Can be passed multiple times. This searches for packages on both the default index and all additional indexes. Pip assumes that all indexes are coherent, i.e. the same package name refers to the same package on all indices.
Unfortunately,
this is not the case. The
pyda package developed at CERN (currently at v0.2) has the same name as an
independent package on PyPI (at v1.0). To avoid installing the wrong package, you will have to restrict the version such that Pip will not consider the wrong package:
pip install "pyda ~= 0.2.0"
Beside the issue of name conflicts,
--extra-index-url is usually the better choice. You can configure Pip to always look for packages on Gitlab and
PyPI with this command:
pip config --user set global.extra-index-url "https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"
or by editing
~/.config/pip/pip.conf directly.
Another option is a two-step process: You can first
download all the GSI-specific packages from Gitlab, ignoring the default index. Then install these local packages, picking up anything that's still missing from the default index:
pip download --no-deps --index-url="https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple" pyda pyda-rda3 pyrda3 pjlsa pjlsa_gsipro cmmnbuild_dep_manager pyrbac
pip install *.whl
Other Package Indexes
If you want to use the Gitlab package index of a specific project instead of the collective one of the
scripting-tools group, the URL is constructed as follows:
https://git.gsi.de/projects/<group>%2F<project>/-/packages/pypi/simple
Note how group and project are separated by
%2F and
not by a regular slash!
Packaging your Python code
The pyproject.toml Manifest File
To package your code for others to install, you need to specify some information about it in a manifest file. The most common manifest file these days is
pyproject.toml. This file typically contains the following information:
- How to build your package (if there is a building step)
- metadata like author, version, description, etc.
- which files are part of the distributed code (excluding files like README, helper scripts, etc.)
- configuration for any number of development tools
While a lot of information
can be added, only the following is required. If you use a different build system that Setuptools, you'll have to change the
[build-system] section according to that system's documentation.
[build-system]
requires = ["setuptools >= 61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "project-name"
version = "1.0.0"
requires-python = ">= 3.9"
dependencies = [
"cernml-coi >= 0.9",
"matplotlib >= 3.8",
# ...
]
The following information is not required, but strongly encouraged:
[project]
# ... as above
description = "Lovely Spam! Wonderful Spam!"
readme = "README.md"
authors = [
{name = "Your Name", email = "y.name@gsi.de"},
# ...
]
maintainers = [
{name = "Your Name", email = "y.name@gsi.de"},
# ...
]
[project.urls]
Gitlab = "https://git.gsi.de/y.ourname/project-name"
# ...
Bumping Your Version
In most projects, bumping the package version and publishing a new release requires a numer of detailed steps and it's easy to forget one. Examples are:
- Adding release notes to your change log
- Changing the version number in
pyproject.toml
- Changing the
package.__version__ attribute (which is actually non-standard, not required, and not universally used)
- Tagging the associated Git commit
- Making a Gitlab release
- Publishing the package to a package index
To automate and simplify this process, many people have come up with many different tools. You'll have to experiment and find one that you like. There are two I'd like to point out:
- Committizen
- This tool ensures commit messages follow a consistent style, it bumps your version number for you, creates a tag, and creates a changelog based on those uniform commit messages. This only leaves the actual publishing step to you.
- setuptools-scm
- This tool follows the opposite approach from Committizen (though you can combine both). Instead of writing your version number in a file and creating a tag based on it, this removes the version number from your
pyproject.toml file. Instead it is automatically deduced from your Git tag when the project is built. (As of time of writing, this is the one I have experience with.)
Uploading a Package
This procedure is similar to
downloading packages, but instead of Pip you use
Twine. The steps are:
- Configure Twine to be aware of our custom package index;
- Create a personal access token with the api (not read_api!) scope and let Twine know about it.
- Create a wheel file from your package.
- Upload the package.
Unlike Pip, Twine is configured through the file
~/.pypirc. If you don't have one create it and fill it roughly like this:
[distutils]
index-servers =
data-harvest
[data-harvest]
repository = https://git.gsi.de/api/v4/projects/scripting-tools%2Frda-data-harvest/packages/pypi
username = y.ourname
password = <token>
Once this is done, you can enter your project directory and create a wheel file as follows:
pip wheel --no-deps .
Wheels are renamed ZIP files that contain all your Python code as well as some metadata about the package. You can upload them like this:
twine upload -r data-harvest *.whl
You
have to either pass your package index of choice via
-r=/=--repository, or set the environment variable
TWINE_REPOSITORY to that index' name. Otherwise Twine will pick
PyPI by default.