You are here: GSI Wiki>BEPHY Web>PythonPackaging (2025-05-13, PennyMadysa)

This is an attempt to make it easier to share, publish and download Python packages at GSI.

The lack of infrastructure means that all options are going to involve larger or smaller amounts of hackery, but we'll try to make it as simple as possible.

See PythonBridge for examples how to use the pyda and pjlsa packages.

See GeOFF for help on how to use the cernml collection of packages.

Quickstart

Start here if you're new. This is the brief form of the instructions on how to set up a working Python environment inside the ACC network.

The following summary is intended as a reminder in case you set up your environment again. If you're doing this the first time, only read it briefly to get an overview, then work through the numbered sections that follow it.

The section "Detailed information" contains more exhaustive information and serves as a reference.

Summary

This mostly serves as a reminder for second-timers, but it can also be useful to get an overview.

Acquire an ACO account and enable LSA access.

Create a virtual environment and enter it.

mkdir ~/venvs
python -m venv --system-site-packages --upgrade-deps --prompt=venv ~/venvs/default
source ~/venvs/default/bin/activate

(Consider putting the last line into your ~/.bashrc file or using a convenience function.)

Create a personal access token, create a file only usable by you:
```
touch ~/.netrc && chmod "u=rw,go=" ~/.netrc
```
and add an arbitrary username and your token:
```
machine git.gsi.de
    login gitlab
    password <your token>
```

Configure Pip to search the Gitlab package registry for packages:

pip config --user set global.extra-index-url "https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"

Specify a CMW directory server for FESA access:
```
export CMW_DIRECTORY_CLIENT_SERVERLIST="cmwpro00a.acc.gsi.de:5021"
```
(Consider putting this line into your ~/.profile or ~/.bash_profile file – depending on which one already exists – or a convenience function.)
Remember that adding a line to your ~/.profile or ~/.bash_profile file doesn't do anything until you source these files or log in again; you'll still have to execute this line manually to continue.
If necessary. run the tests in the following sections to ensure everything has worked.

1. Access the ACC Network

Access to FESA devices is only available within the ACC network. If you need to access FESA devices or the LSA database, you will need an ACO account. Follow the instructions on the linked page.

If you need LSA database access, Jutta Fitzek will need to send an e-mail on your behalf as well.

The machines available for SSH access are asl751 ... asl756.acc.gsi.de.

In addition, the BEPHY department also has asl154.acc.gsi.de for long-term data taking. However, this is not a development machine and many common software packages are missing. Furthermore, it has no Internet access, so you to transfer scripts onto it, you will have to either proxy out, or remote-copy files onto it with scp, rsync or sftp.

2. Create and Enter a Virtual Environment

We will install all the Python packages not into your global Python environment, but in an isolated virtual environment to keep dependency issues under control. A virtual environment adjusts the paths where Python looks for your packages and nothing else. In particular, all config files that are visible outside of a venv are also visible within.

Log into a terminal on the machine where you want to install GSI Python packages and run the following command:

python --version

Ensure that your Python version is 3.9 or 3.11. Two important packages are only available for these versions.

Thenrun the following commands in your terminal:

mkdir ~/venvs
# (Feel free to modify the following parameters if you know what you're doing.)
python -m venv --system-site-packages --upgrade-deps --prompt=venv ~/venvs/default
# Enter the venv by adjusting your shell's environment variables.
source ~/venvs/default/bin/activate

Important! The last line makes you enter the venv. Type deactivate to leave it again. Every time you log out of the machine, you also leave the venv. Consider putting the activate line into your ~/.bashrc file or a convenience function.

3. Create an Authentication Token for Gitlab

A token is a randomly generated password generated on the server side. Unlike real passwords, it only has a limited lifetime and a restricted list of permissions. All Gitlab personal access tokens begin with the string glpat.

Log into git.gsi.de and navigate to your profile page on Personal Access Tokens. Make sure the box for read_api is checked (the other boxes don't matter), create a token and copy it. Keep the window open.

Create a file called .netrc in your home directory and ensure that you and only you can access it:

touch ~/.netrc
chmod "u=rw,go=" ~/.netrc  # Read/write access for User, no access for Group members and Others

Then edit the file so that it contains the following lines:

machine git.gsi.de
    login gitlab
    password <your token>

where is replaced with the token that you copied in the previous step. If you copied the previous commands and lost your token, go back to the browser window and copy it again.

Make sure that the token string starts with the characters "glpat". If it doesn't, you probably copied the token of your RSS feed (bottom of the page) instead of the token you created (top of the page).

4. Configure Pip to Look up Packages on Gitlab

Run the following command in your terminal:

pip config --user set global.extra-index-url "https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"

This configures Pip to look for packages both on the default package index and on our Gitlab package registry. If your machine does not have general Internet access, replace extra-index-url with just index-url. In this case, Pip will only search Gitlab for packages.

With the previous steps done, the following line should download the correct packages:

pip install "pjlsa-gsipro" "pyda-rda3 ~= 0.2.0" "gymnasium < 1"

Pip produces a lot of output, but the last few lines should confirm that the following packages (plus their dependencies) have been installed:

cmmnbuild-dep-manager-2.14.0
gymnasium-0.29.1
pjlsa-0.2.18.post1
pjlsa-gsipro-1.1.2
pyda-0.2.0.post1
pyda-rda3-0.2.3
pyrda3-0.2.4

You can test if everything worked by running the following short Python script:

from pjlsa_gsipro import LSAClientGSI

lsa = LSAClientGSI ()

with lsa.java_api():
   from cern.lsa.client import ContextService, ServiceLocator

cs = ServiceLocator.getService(ContextService)
patterns = list(cs.findResidentPatterns())

for pattern in patterns:
   print(pattern)

Besides a ton of pointless noise, this should also ultimately print a list of all patterns currently loaded into the accelerators at GSI.

5. Specify the CMW Directory Server

In order to access FESA, you need to specify a CMW directory server to connect to. There are three servers at GSI: one for production, one for development and one for integration testing.

To always connect to the production server, put this line in your profile file (~/.profile or ~/.bash_profile):

export CMW_DIRECTORY_CLIENT_SERVERLIST="cmwpro00a.acc.gsi.de:5021"

If you also need to test and integration servers occasionally, have a look at this snippet. It defines a function that lets you set the desired server without having to remember the full address:

setup-cmw pro
setup-cmw dev
setup-cmw int

You can verify that this works by receiving data from a device. (Note that you might have to change the device name to that of one that's currently running)

import pyda, pyda_rda3

rda3 = pyda_rda3.RdaProvider()
client = pyda.SimpleClient(
    provider=pyda_rda3.RdaProvider(),
)

s = client.subscribe("YRT1DC3/Acquisition", context="FAIR.SELECTOR.ALL")
for a in s:
    print(a.value.header.selector)

You can break out of the loop with Ctrl-C.

If you forget to specify the CMW directory server (like I often do), you'll usually see this error message:

pyrda3._rda3_bindings.NameServiceException: CMW Directory Service getDeviceInfo() request failed for: device 'GS09DT_ML' and domain 'RDA3' --> CMW get-device-info command failed: Cannot find the host 'cmw-dir-pro1.cern.ch'; Cannot find the host 'cmw-dir-pro3.cern.ch'; Cannot find the host 'cmw-dir-pro2.cern.ch';

where pyda, in absence of an override, falls back on its default directory servers, which are the ones from CERN, which are not accessible from GSI (for good reason).

Detailed Information

This section contains all the more in-depth information in case problems come up or you want to know more.

List of relevant packages

This is a brief list of all the packages that you will need for one reason or another:

pyda: A façade package that provides a uniform interface to talk to FESA devices; always needs a provider package to do the actual work
- pyda_rda3: The provider package for pyda that adds support for the pyrda library.
- pyrda3: Python bindings to the RDA3 C++ library
- pyrbac: Package that implements Role-Based Access Control. Required by pyda but currently not used at GSI.
- pyccda: Package that implements the Controls Configuration Data API of CERN. Required by pyda but currently not used at GSI.
pjlsa_gsipro/int/dev: Python bindings to the Java libraries that provide access to the LSA database at GSI. They connect to the production database, integration-testing database and the development database respectively. These have to be separate packages because each database requires different versions of its Java dependencies.
- pjlsa: Core Python package that implements the logic and API of the above three.
- cmmnbuild_dep_manager: Downloads and manages Java libraries required by pjlsa.
- jpype: Package that connects a Python interpreter to a Java VM so that Java libraries can be called from Python.

Warning! The packages pyrda3 and pyrbac only exist for Python 3.9 and 3.11. Make sure that your Python version matches one of these, or the package index will simply (and confusingly) report that no versions could be found.

Virtual Environments

Virtual environments make it easy to install Python packages for different projects without a risk of version clashes between their dependencies. Each venv links to a Python interpreter (e.g. the system one), but provides a completely isolated directory for package installation. This is simular to Conda, which in addition manages not only Python but also C++ dependencies.

You can create a fresh venv by running this command:

python -m venv <path/to/the/venv>

. As path, I recommend either a home directory folder like ~/venvs/ or a directory ./.venv within your Python project, dependending on your needs.

The venv module accepts a number of arguments, here are the important ones:

--system-site-packages: In addition to the packages inside the venv, all packages in the global environment are accessible as well. Without this option, you start with a completely clean venv in which nothing is installed.
--upgrade-deps: Automatically update Pip and Setuptools when creating the venv. The default is to put in whatever exists in the global environment.
--prompt=: Customize the text that is shown next to your command prompt while the venv is active. The default uses the last element of the venv's path.

You can enter a venv by running the following line:

source <path/to/the/venv>/bin/activate

While inside a venv, you can exit it with the special command deactivate, or by switching to another venv. See also vactivate.sh for a nicer way to switch venvs.

While inside a venv, you can pip install whatever Python packages you want. They are accessible while inside the venv, and inaccessible outside of it.

Local installs

After you have cloned a properly packaged Python project from Gitlab to your local disk, you can always install it to your venv by running the following line:

pip install .

(where "." as usual refers to the current directoy). Once installed, the Python package is available via import no matter your current working directory.

The way this works is that all your Python files are copied from your project directory to the venv's library directory. This means that if you make any changes to your project files, they will not be reflected by the installed package. You will have to install the package again to make the changes accessible.

While developing a package, you will also want to install it in development mode. You can do this with this line:

pip install --editable .  # or `pip install -e .` for short

. This symlinks your project directory in the venv directory. This means that any changes that you make are reflected immediately. This obviously doesn't make sense outside of active development.

Git installs

If you don't plan to make any changes to a Python project, you don't need to clone it just for installation. Pip supports installing a package straight from a remote Git server. The exact command line depends on whether you want to authenticate yourself to Gitlab via HTTPS (username+password) or SSH (private key file):

pip install git+https://git.gsi.de/<group>/<project>   # HTTP (username+password)
pip install git+ssh://git@git.gsi.de/<group>/<project> # SSH (~/.ssh/id_<algo>)

Note that for public projects (i.e. visible without Gitlab login), you don't need to authenticate at all.

This works well for quick one-off installations. It does not work well for packages that other packages depend on. There is no version management, so it'll be very easy to break other people's code on accident.

Gitlab package index

Gitlab also makes it possible to run your own package index from which Pip can install packages. I've set one up here. The way it works is that every Gitlab project has its own index (e.g. this one) and the group-level index aggregates all packages of the projects in the group.

Authentication

Unfortunately, it's still a bit of a chore to set this up. Because Gitlab doesn't support SSH or password authentication, you will need an authentication token to connect to the package index. That's basically a server-side-generated random password with limited permissions and lifetime attached.

Go to your profile page on Personal Access Tokens. Make sure that the "read_api" box is checked and create a token. Copy the token string.

Once this is done, you have two ways to pass this token to Pip:

The recommended way: Create a file ~/.netrc and write in it the following contents:
```
machine git.gsi.de
    login gitlab
    password <token>
```
where is replaced with the token you copied earlier. You should ensure that no one but you can access this file:
```
chmod "u=rw,go=" ~/.netrc
```
Now, every time that Pip connects to the server git.gsi.de, it will automatically log in with these credentials.
When specifying the index URL (see below), explicitly include the log-in credentials. This could look like this:
```
pip install --index-url="https://gitlab:$GITLAB_TOKEN@git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"
```
where $GITLAB_TOKEN is either an environment variable in which you store the token, or you just replace it directly with the token. Obviously this is unsafe because it exposes your token (which is a kind of password) to the entire GSI. (Remember that every command you type is saved in ~/.bash_history!)

Connection

There are two command-line parameters that influence where Pip looks for packages:

--index-url=: This replaces the default package index with the one given by the URL. This means that common packages like e.g. Numpy can no longer be found unless they're also available on the new index.
--extra-index-url=: Can be passed multiple times. This searches for packages on both the default index and all additional indexes. Pip assumes that all indexes are coherent, i.e. the same package name refers to the same package on all indices.

Unfortunately, this is not the case. The pyda package developed at CERN (currently at v0.2) has the same name as an independent package on PyPI (at v1.0). To avoid installing the wrong package, you will have to restrict the version such that Pip will not consider the wrong package:

pip install "pyda ~= 0.2.0"

Beside the issue of name conflicts, --extra-index-url is usually the better choice. You can configure Pip to always look for packages on Gitlab and PyPI with this command:

pip config --user set global.extra-index-url "https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple"

or by editing ~/.config/pip/pip.conf directly.

Another option is a two-step process: You can first download all the GSI-specific packages from Gitlab, ignoring the default index. Then install these local packages, picking up anything that's still missing from the default index:

pip download --no-deps --index-url="https://git.gsi.de/api/v4/groups/scripting-tools/-/packages/pypi/simple" pyda pyda-rda3 pyrda3 pjlsa pjlsa_gsipro cmmnbuild_dep_manager pyrbac
pip install *.whl

Other Package Indexes

If you want to use the Gitlab package index of a specific project instead of the collective one of the scripting-tools group, the URL is constructed as follows:

https://git.gsi.de/projects/<group>%2F<project>/-/packages/pypi/simple

Note how group and project are separated by %2F and not by a regular slash!

Packaging your Python code

The pyproject.toml Manifest File

To package your code for others to install, you need to specify some information about it in a manifest file. The most common manifest file these days is pyproject.toml. This file typically contains the following information:

How to build your package (if there is a building step)
metadata like author, version, description, etc.
which files are part of the distributed code (excluding files like README, helper scripts, etc.)
configuration for any number of development tools

While a lot of information can be added, only the following is required. If you use a different build system that Setuptools, you'll have to change the [build-system] section according to that system's documentation.

[build-system]
requires = ["setuptools >= 61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "project-name"
version = "1.0.0"
requires-python = ">= 3.9"
dependencies = [
  "cernml-coi >= 0.9",
  "matplotlib >= 3.8",
  # ...
]

The following information is not required, but strongly encouraged:

[project]
# ... as above
description = "Lovely Spam! Wonderful Spam!"
readme = "README.md"
authors = [
  {name = "Your Name", email = "y.name@gsi.de"},
  # ...
]
maintainers = [
  {name = "Your Name", email = "y.name@gsi.de"},
  # ...
]

[project.urls]
Gitlab = "https://git.gsi.de/y.ourname/project-name"
# ...

Bumping Your Version

In most projects, bumping the package version and publishing a new release requires a numer of detailed steps and it's easy to forget one. Examples are:

Adding release notes to your change log
Changing the version number in pyproject.toml
Changing the package.__version__ attribute (which is actually non-standard, not required, and not universally used)
Tagging the associated Git commit
Making a Gitlab release
Publishing the package to a package index

To automate and simplify this process, many people have come up with many different tools. You'll have to experiment and find one that you like. There are two I'd like to point out:

Committizen: This tool ensures commit messages follow a consistent style, it bumps your version number for you, creates a tag, and creates a changelog based on those uniform commit messages. This only leaves the actual publishing step to you.
setuptools-scm: This tool follows the opposite approach from Committizen (though you can combine both). Instead of writing your version number in a file and creating a tag based on it, this removes the version number from your pyproject.toml file. Instead it is automatically deduced from your Git tag when the project is built. (As of time of writing, this is the one I have experience with.)

Uploading a Package

This procedure is similar to downloading packages, but instead of Pip you use Twine. The steps are:

Configure Twine to be aware of our custom package index;
Create a personal access token with the api (not read_api!) scope and let Twine know about it.
Create a wheel file from your package.
Upload the package.

Unlike Pip, Twine is configured through the file ~/.pypirc. If you don't have one create it and fill it roughly like this:

[distutils]
index-servers =
    data-harvest

[data-harvest]
repository = https://git.gsi.de/api/v4/projects/scripting-tools%2Frda-data-harvest/packages/pypi
username = y.ourname
password = <token>

Because this file, like ~/.netrc, contains your access token, you should make it inaccessible to anyone but you:
```
chmod "u=rw,go=" ~/.pypirc
```
Remember that the token needs to have the scope api. Otherwise, Gitlab will reject the upload.
Because the scripting-tools package index (which aggregates the indexes of all projects in its group) is read-only, we need to specify a project package index. Here, we use rda-data-harvest.
You can define as many package indexes as you want, they will not conflict in any way. Just remember to list them all under the index-servers key.

Once this is done, you can enter your project directory and create a wheel file as follows:

pip wheel --no-deps .

Wheels are renamed ZIP files that contain all your Python code as well as some metadata about the package. You can upload them like this:

twine upload -r data-harvest *.whl

You have to either pass your package index of choice via -r=/=--repository, or set the environment variable TWINE_REPOSITORY to that index' name. Otherwise Twine will pick PyPI by default.

Please login to edit this topic

Topic revision: r21 - 2025-05-13, PennyMadysa

BEPHY

Warning: Can't find topic BEPHY.WebLeftBarExample

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding GSI Wiki? Send feedback | Legal notice | Privacy Policy (german)