Skip to content

Releases: fgci-org/fgci-ansible

v1.2.0 - Jelly Jamaican

25 Apr 07:32
Compare
Choose a tag to compare

General

  • #115 - cleanup and simplification begun of package installation

Role Updates

  • fgci-install : sync host_vars to install node too if you have them
    • #114 ansible-pull success/failures are now sent to cassini - it's possible to make a dashboard showing the status of ansible-pull
    • it's possible to disable the ansible-pull cronjob (and the above annotations)
  • cuda role is run before slurm
  • slurm: some TRES improvements from @jabl
  • yum : another variable that can be used to install package introduced: global_packages
  • fail2ban: --check should work now
  • users: quite large update from @khappone that makes it figure out which user to use. In the future we could remove some tasks in the playbook and just put the users role first.
  • yum-cron: it's now possible to set settings on yum-cron - for example exclude java

v1.1.0 - Industrious Indigo

30 Mar 05:16
Compare
Choose a tag to compare

General updates

In fgci-ansible git repo the stable branch is now the master branch. If you want bleeding edge use the devel branch. In either case, use the same branch for your ansible-push as for ansible-pull.

  • minor change of order of some roles in playbooks, this is due to dependencies
  • smart fact gathering in ansible - should speed up ansible runs
  • allow setting resolv.conf search domain and other options
  • arc-frontend updated to contain the NorduGrid 2015 chain
  • quite many changes done to the slurm role. Mostly related to GPU and GRES.

New Roles

  • idmapd
  • open-vm-tools (for VMWare guests)

v1.0.0 - Humble Habu

07 Mar 13:39
Compare
Choose a tag to compare

General Updates

Roles versions are now freezed in requirements.yml
This means that if a role is updated it is not used instantly by the clusters. We/one would have to manually update requirements.yml to point to a more recent commit or tag. Reason is to make it more stable and prepare for production.

The order in which some roles were executed.

ansible-pull on compute nodes now pull the files from the install node - being a bit nicer to github

NAT rules are now in the ferm firewall role as well as in the ip_forwarder role.

New tools:

  • upgradeAndReboot.yml
    • Might not work for every cluster but
  • gitmirror_conf_generate.py
    • Used by git-mirror to have a mirror of every role on the install node

New Variables

  • slurm_nodelist and slurm_partitionlist
  • ib_ip_addr a variable in the hosts file to define an ib ip address

New Roles

Most of these are not enabled by default, but can be with setting some variables. See each role for more details.

Role Updates

  • Fix serial console for compute nodes
  • Improve partitioning on login/compute nodes
  • only generate ed25519 ssh host keys

v0.9.4 - Gracious Green Snake

15 Feb 14:15
Compare
Choose a tag to compare

Production branch merged with master.

General Updates

fgci-ansible

  • new variable to configure IB interfaces: network_ib_interfaces
  • moved common network variables to group_vars/all/all.yml
    • changing variable name: ip_address to int_ip_addr on tasks and templates. This also is reflected on inventory file "hosts".
  • internal network IPs are now in hosts file
  • Removing unused playbooks (after_provisioning.yml and all_nodes.yml)
  • site.yml has now reinstall and rebook playbooks commented allowing harmless run
  • nfs_exports: duplicated shares to cover all internal subnets (ether and Ib)
  • fgci-meta-* metapackages are installed
  • sync compute.yml and local.yml playbooks
  • install and run smartd on compute nodes too
  • Configure ib from dual to single port (ansible-role-rdma)
  • admin node installs now a metapackage from FGCI7 repo
  • new role: ansible-role-pdsh-machines for issue
  • new variable to configure IB interfaces: network_ib_interfaces
  • tools/reboot_computes.yml - Reboot without errors
  • added ansible-role-cvmfs in nfs.yml - This will allow running performance tools available in CVMFS from NFS node
  • if installed, NetworkManager gets removed
  • FGCI7 repo added to login and nfs nodes

New Roles

ansible-role-serial-console

mhakala/ansible-role-pam

ansible-role-serial-console

cmprescott/ansible-role-autofs

jtyr/ansible-nsswitch

mhakala/ansible-role-lustre_client

Role Updates

ansible-role-slurm

  • Only create the slurm user/group on the NIS server
  • cgroups: /sys/fs/cgroup is now mounted by systemd and not slurm

ansible-role-rdma

  • add options to disable managing rdma and opensm services
  • set the "lspci" shell commands to changed_when: False, so they never show up as "changed" in ansible.
  • Since memlock unlimited is set by ansible-system-limits, the copy rdma.conf is disabled by default.

ansible-role-nis

  • make nscd management optional and state/enabled variables
  • same variable for NetworkManager
  • make the NISDOMAIN task a bit more readable
  • skip stop and disable NetworkManager when NetworkManager isn't installed

ansible-role-nfs

  • make nfs service optional

ansible-role-pxe_bootstrap

  • Allow serial port to be chosen, default to ttyS0 on boot.py PXE script

ansible-role-sshd-host-keys

  • changing variable name: ip_address to int_ip_addr on tasks and templates. This also is reflected on inventory file "hosts".

ansible-role-pxe_config

  • changing variable name: ip_address to int_ip_addr on tasks and templates. This also is reflected on inventory file "hosts".

ansible-role-yum

  • the address to the proxy changed to URL

ansible-role-fgci-bash

  • log module usage to syslog

ansible-role-squid

  • squid service gets now started and enabled in the task

ansible-role-users

  • new feature: add / remove pubkeys to the root user

ansible-role-fgci-install

  • permissions updated to 0755 for ansible-pull-script.sh
  • Added some missing default variables
  • Adding random sleep interval into ansible-pull-script.sh
  • Replacing hard-coded delays by random time configurable by ansible_pull_sleep var
  • new variable: fgci_node_type if is:
    • service: setup dhcpd, copies in ansible var files, ansible-pull-script to the service/install node
    • compute: pushes the ansible-pull-script.sh to the compute node
  • add variable ansible_pull_branch (defaults to production)

v0.9.3 - Forest Flame Snake

05 Feb 10:20
Compare
Choose a tag to compare

Minor update.
Production branch merged with master.

General Updates

New Roles

Role Updates

  • network_interfaces - possibility to configure IB interfaces (specifically settings for partitions and connected_mode)
  • smartd now runs on compute nodes too
  • rdma - configure interfaces as Ethernet
  • slurm:
    • configuration of cgroups
    • CI syntax testing
    • slurm repo version is a variable
  • nhc: drain/reboot node if nvidia-smi exits with return code 255 or 17
  • fgci-bash: log module usage to syslog of local node
  • rsyslog: ship logs from every node with "lmod" tag to central_log_host
  • arc-frontend: new variable "init_griduser_accts" which defaults to False. This needs to be set to True to create the grid user accounts.
  • nis: new variable: nis_manage_nsswitch . If this is set to False then we do not manage nsswitch file. For the adauth role.
  • users: it's now possible to add ssh pubkey to root user too (not doing that for FGCI)
  • flowdock: fixed tags.

v.0.9.2 - Elegant Egg-eater

25 Jan 12:13
Compare
Choose a tag to compare

Minor update.
production branch has been synced up with test and master.

General Updates

  • ansible-pull updates
    • the address to the install node is now templated in both in local.yml and in ansible-pull-script.sh
  • travis/testing works again
  • ansible performance tuning - thanks to @jabl for the contribution
    • more forks and "pipelining = True"
  • prep_local.yml helper script is removed

Updates to roles

  • ansible-role-slurm
    • address to accountinghost is now a variable - defaults to service node
  • ansible-role-rsyslog on admin-node does not write remote logs into /var/log/messages anymore
  • ansible-role-fgci-bash
    • a way to install only some specific *sh login scripts
  • ansible-role-users
    • key option for authorized keys were added
  • ansible-role-rdma
    • some option to configure some ports to Ethernet
  • ansible-role-flowdock
    • also send site name in the message
  • ansible-role-pxe_config
  • ansible-role-yum
    • always install libselinux-python - needed for machines with selinux enabled
  • ansible-role-squid
  • ansible-role-postfix
    • install postfix first

New Roles

  • fgci-repo
    • installs the fgci repo
  • yum-cron-2
    • installs a working yum-cron configuration
    • named yum-cron-2 to not conflict with the old one

v0.9.1 - Delirious Dwarf Boa

21 Dec 13:30
Compare
Choose a tag to compare

This is a minor release, no new roles.

  • collectd: remove some tags
  • installing some more programs, rsync, bash-completion, wget, nfs-utils, pdsh
  • some guidelines for contributions
  • ansible-pull is now set by default to pull the "production" branch. This is controlled with the ansible_pull_branch - this can also be a tag.
  • fixes to ssh host key generation and ordering
  • dns role runs after dnsmasq role in install node

v0.9.0 - Credulous Cobra

09 Dec 12:19
Compare
Choose a tag to compare

158 commits to fgci-ansible repository since this v0.3.0 29 days ago

Updates

  • Slurm 15.08.4
  • All group_vars are now moved into examples/group_vars. These needs to be copied to fgci-ansible/group_vars/
  • Travis-ci is now running on most roles
  • NIS user management
    • We've reinstalled the test cluster many times during this period to iron out the problems
    • Nodes are now configuring themselves with the help of ansible-pull.
  • To make ansible-pull work there are a few new caveats
  • If there is a change to a group_vars, ansible hosts file, ssh keys or the ansible-pull-script.sh these needs to be updated o the install node.
  • Site.yml is updated, it's not enough to just run that to reinstall the cluster. For complete instructions for using this and all the roles - see https://confluence.csc.fi/display/FGCI/FGCI+Cluster+deployment+and+installation+guide
  • Passwords are now disabled on admin accounts.
  • SSH access is restricted.
  • Login node is now the NAT gateway
  • PDSH is now installed
  • Hosts file is now updated to contain many more entries (also .local and -ib style of the compute nodes)
  • Lmod is now installed
  • Rsyslog improvements - it's now possible with a variable decide which programs to log to the admin node.
  • Firewall rules have been cleaned up.
  • Host_vars have been removed
  • virt-manager is now usable on the admin node
  • Many variables have been consolidated.
  • Resolver.conf should now point to the right addresses (it's not the same everywhere )
  • Various ordering issues spotted during reinstall exercises - tasks trying to write to NFS before NFS is installed for example. These should mostly be OK now.
  • sysctl/ulimits now also set on compute nodes.
  • rdma/IB configured on nfs node.

New Roles

  • ansible-role-nhc
    • node health checker for slurm
  • ansible-role-dell
    • installs dell management tools as well as Dell System Updater - for firmware updates
    • this includes a racadm.sh script which can help somewhat in managing the iDRACs
  • ansible-role-postfix
    • configures a relayhost and sets IPv4 only
  • ansible-role-sshd-host-keys
    • SSH keys are now managed. There are roles for install node also in nfs.yml about the ssh keys

Tools

  • tools/pullReqs.sh - galaxy installs all the ansible repos
  • tools/diff_group_vars.sh - to help with checking for new updates in the examples/
  • the old FGI reinstall script is now also available on the install node

v.0.3.0 - Benign Boa

10 Nov 12:19
Compare
Choose a tag to compare
v.0.3.0 - Benign Boa Pre-release
Pre-release

Updates

209 commits to fgci-ansible repository since v0.2.0 (does not include updates in roles in other repositories)

More yum repos have been added
A few more settings have been into examples/ directory
Hosts file have been updated - now the compute nodes need some variables set like MAC address and such.
Slurm is 15.08.03
Travis-ci for fgci-ansible repo now spawns a CentOS7 docker container and runs various ansible tasks.

site.yml is now updated to have all except the compute nodes.
There's also an after_provision.yml - this is for roles on the admin node that needs to run after the install node is up

New roles

chronological order from commits:

  • ansible-role-system-limits
    • adds some settings into /etc/sysctl.d and /etc/security/limits.d
  • ansible-role-arc-client
  • ansible-role-yum
    • installs IB things and on compute nodes templates/overwrites /etc/yum.conf and adds proxy
  • ansible-role-fail2ban
    • installed only on login-node
  • ansible-role-flowdock
    • sends notification to the FGCI channel in flowdock
  • ansible-role-collectd
    • sends metrics - also ipmi metrics if it's a physical machine
  • ansible-role-nis
    • install node runs a NIS server. Grid,compute and login nodes are clients
  • ansible-role-dhcp_server
  • ansible-role-dnsmasq
  • ansible-role-pxe_config
  • ansible-role-pxe_bootstrap
    • dhcp_server dnsmasq and pxe_* roles reinstalls a node over PXE/DHCP
  • ansible-role-nfs
    • setup exports
  • ansible-role-nfs_mount
    • setup mounts
  • ansible-role-arc-frontend
    • installs and configures an ARC CE
  • ansible-role-cuda
    • installs CUDA yum repo and installs cuda

Tools

  • reboot_computes.yml
  • reinstall_computes.yml

v.0.2.0 - Altruistic Anaconda

29 Sep 13:00
Compare
Choose a tag to compare
Pre-release

This is an example of how we could make releases / release notes.

General updates

  • "ansible-playbook -u myfgciusername -i hosts site.yml" now mostly works,
    • username aren't distributed to login-node yet so some manual hacks still needed
  • Added travis-ci for some FGCI roles :) https://travis-ci.org/CSC-IT-Center-for-Science/
    • This passes but it doesn't actually test anything at all (not even syntax checking)
  • Refactoring of group_vars/
  • fgci-examples/ directory has some example files - this is where new cluster admins should copy files from to start creating the config files that ansible uses.

New roles

  • admin-role-aliases
    • Configures basic /etc/aliases and runs newaliases
  • ansible-role-cvmfs
    • Need to fix pointer to proxy
  • ansible-role-fgci-login
    • Only installs EPEL for now
  • ansible-role-rdma
    • Basically only installs it and some udev rules for /dev/infiniband/
  • ansible-role-ferm-firewall
    • Not 100% sure that all these hosts should have NAT, nor that the NAT rules are good.
    • SSH blocks all out so far at least
  • ansible-role-ntp
  • ansible-role-rsyslog-client
    • Ships logs to a remote rsyslog server
  • ansible-role-slurm
    • Defaults should not be trusted - we should set them to something better. At least think about:
    • FirstJobId
    • All the spool/state dirs - /tmp is probably really bad
    • Population of nodes / partitions is not working
  • ansible-role-sshd
    • A remote role - restricts ssh access to root/admins where appropriate
  • ansible-role-users
    • Adds some CSC admin-users (can be disabled)
  • ansible-role-yum-cron
    • Enables auto-download cron

Updates to existing roles:

  • provision_vm
    • Provisions VMs on the install node.