Hybrid Burn-in (HBI) setup at Birmingham Instructions (jpt) 15Aug24 Disclaimer: Use common sense. No warranty. Version v1b - Thu Aug 15 04:01:31 PM BST 2024 Note: Some of the steps may seem odd, but they are there for a reason. Esp the Genesys-2 firmware, esp the network connection, seems to have issues which can be avoided with those steps. 1) DAQ PC: The PC is epldt123 which is the keyboard with the smaller screen sitting 'squeezed' next to the single-hybrid-panel test setup. Login is the shared local account ('itkuser2') with the known password. The PC is AlmaLinux 9. So login to this PC, sitting in front of it. Esp as you want to see esp the LV PSU current draws. 2) Check the PC's memory usage: Do this: [itkuser2@epldt123 ~]$ grep Swap /proc/meminfo SwapCached: 87692 kB SwapTotal: 16777212 kB SwapFree: 14480124 kB The output here shows it's 'bad', it uses 2GB of Swap memory. The simple solution for this is: Reboot the PC. It should then show the same number for 'SwapTotal' and 'SwapFree' ie: [itkuser2@epldt123 ~]$ grep Swap /proc/meminfo SwapCached: 0 kB SwapTotal: 16777212 kB SwapFree: 16777212 kB (Background: It looks like there is a memory leak during the HBI run which may/can have negative effects during the 100h long test) Once rebooted, login again (obviously). If you have rebooted, please do restart this script (it allows for external monitoring eg by Juergen): [itkuser2@epldt123 ~]$ cd Documents $ nohup ./copy_memusage_withGUI.sh & $ cd ~ For info: The website this feeds is here, but don't worry if it doesn't update, things are probably fine: http://epweb2.ph.bham.ac.uk/user/thomas/tracker/trh/burnin.txt 2) Extracting and inserting the hybrid panel The HBI crate is '#1' the left hand one. The newer second crate on the right-hand side is not usually connected (loose cables). !!! The crate needs to be powered-off (!) when extracting or inserting a !!! hybrid panel. Otherwise this is a 'hot swap' which can !!! be damaging to hybrids and crate ! Powering it down is done in the safest way by switching-off the TTI CPX400DP LV PSU which feeds it, on its main switch. This PSU sits on the left-hand side of the HBI crate (along with a red multimeter). It has big red LED number displays, and stickers 'ATLAS ITK BILPA CPX400 #2'. (unlike the others which have black-on-white LCD). Note that it requires a bit of force to insert and extract the panel, but be careful not to assert too much force. For extracting, use the little Allen key next to the crate, with the screw hole on the right-hand side of the panel. This side is where the PCIe-connector is located at the back, so this immediately helps. If you find there are still other hybrid panels in the crate, it's best to take those out and store them in the ISO7 dry cabinet hybrid storage rack, and the bottom shelve. You can take HBiPCs off those panels to attach to your new panel. The blue plug-in powerboards (HBiPC) need to be attached (obviously), ask a technician to do that for you. It requires the orange handling tool. Each HBiPC powers the two hybrids on either side of its position. It's not a problem if there are HBiPCs on the panel which power nothing. The HBiPCs need to be fixated with screws (More screws can be found in the little white both on the desk by the hatch next to the Windows PC). HBiPCs are stored in their transport boxes in the ISO5 dry cabinet on the left-hand side of the 3rd(?) shelve. Only use blue HBiPCs. Not red, and esp not green. Note down the shieldbox number of the HBiPCs you're using. The HBiPCs need to have different 'bond ID' numbers, those are 0 or 1 or 4, as visible on the stickers. They can't be 'mixed' wrt those numbers, ie two 0 or two 1 or two 4 doesn't work. Note this before you start the test, the error messages in ITSDAQ may not be very clear. 2b) Power-up the setup Switch on the TTI LV PSU on its main power switch. Switch-on the right-hand channel first (12V), this is the Genesys-2 FPGA board, the crate fans and the crate buffer. Wait until the Genesys-2 is configured (pattern of green LEDs flashing). Then switch-on the left-hand channel of the PSU which is hybrid panel power (11V). Power consumption whould be 55mA per HBiPC (powerboard), but it's possible the slots aren't yet 'enabled' by the buffer inside the crate backplane, so no panic if there is very little current. 3) Set-up the hybrid configuration: The (latest) installation is here, this contains everything you need: ITSDAQ, the Burnin-GUI, and p_d_s: /home/itkuser2/burn-in-gui_v14Aug24/hybrid-burnin-gui/itsdaq-sw The most useful way to start with a new hybrid panel is to run ITSDAQ first 'stand-alone', and run AutoConfig in there, so the chip tunings are read correctly. This also avoids confusion later-on. Go here: [itkuser2@epldt123 ~]$ cd /home/itkuser2/burn-in-gui_v14Aug24/hybrid-burnin-gui/itsdaq-sw/sctvar/config Let's assume this is a hybrid panel inserted into slot 2. The slots start at '0' just below the Arduino panel, and 5 is the down most one in the crate. And this is a fully occupied panel (6 hybrids): Edit the system config: [itkuser2@epldt123 ~]$ gedit st_system_config.dat & Note this part: #### for slot 2 in HBI: ## [For other slots: Replace '2' in '102', and 2's in '0x2x' accordingly with slot number] #Module 0 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x20 0x21 50 50 JaneDoe0 Barrel_xHCC #Module 1 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x22 0x23 50 50 JaneDoe1 Barrel_xHCC #Module 2 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x24 0x25 50 50 JaneDoe2 Barrel_xHCC #Module 3 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x26 0x27 50 50 JaneDoe3 Barrel_xHCC #Module 4 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x28 0x29 50 50 JaneDoe4 Barrel_xHCC #Module 5 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x2a 0x2b 50 50 JaneDoe5 Barrel_xHCC This is for a full panel in slot 2. Comment-in the hybrids which are present. They are in order of hybrid panel position, ie JaneDoe0 is hybrid position 0. So assuming there is one hybrid at position0, this line should be commented in: Module 0 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x20 0x21 50 50 JaneDoe0 Barrel_xHCC (Note: There may be an easier way to do this now with less parameters, but this way works) If you see entries like this, those are test panels (Juergen), make sure they are commented out: #Jul'24: AutoConfig: default HCC phase is: 'a'. Canary-P17 in slot 2, canary-P19 in slot4 ### 20USBTS0000017 in slot 2, Jul24, HBI2; 29Jul24 in HBI1 (3 HBiPCs, LV PSU test: AMAC com loss) ### 20USBTS0000019 in slot 4 #Module 0 1 1 0 0 -1 0 0 0 1 102 0 0 0 0x26 0x27 50 50 JaneDoe0 Barrel #Module 1 1 1 0 0 -1 0 0 0 1 104 0 0 0 0x44 0x45 50 50 JaneDoe1 Barrel This is a safety step (explained below): $ cp st_system_config.dat st_system_config_myHBI_v0.dat 4) Test the panel 'stand-alone' (without GUI and AMAC looper), this is done the usual way: [itkuser2@epldt123 ~]$ cd /home/itkuser2/burn-in-gui_v14Aug24/hybrid-burnin-gui $ source setup.sh $ cd itsdaq-sw $ ./RUNITSDAQ.sh Check if the hybrids are responding, do: $ AutoConfig(false,false,false) [asks for your DB passwords] Note: This step power-cycles the hybrids in the crate (AMAC off/on sequence), reads the chip IDs, stores the chip tuning values, and should also create the 'assembly' Json which contains the chip IDs for hybrid assembly in the DB. Do 'PedestalTrimScan' and 'StrobeDelay' and check that you get the expected pattern from all the hybrids. If not all hybrids appear, it may make sense to do AutoConfig again. As part of this is enabling the slots and powerin-up the HBiPCs. This _may_ not always work on the first try. If this doesn't work, no or odd patterns, or no response from some or all hybrids, error liks ie chip IDs not read (ABC missing), this could be that the config isn't matching the h/w setup (ie 3 hybrids on the panel, 1 hybrid in the config), but it's also worth restarting ITSDAQ which may help, esp as there is the 'enable slots' step within ITSDAQ. If this works, the setup is ready to start the Burnin-GUI. Leave ITSDAQ with '.q' but leave the terminal window open for data inspection later, namely here: $ cd /home/itkuser2/burn-in-gui_v14Aug24/hybrid-burnin-gui/itsdaq-sw/sctvar/ps $ ls -ltr to see the last written files later. !!! If ITSDAQ is very slow to start and shows lots of 'read data error' red warnings: !!! receive_ack_packet: Didn't receive ack for opcode 0x007a (expecting seq 66) Sequence load !!! and !!! Timeout receiving opcode 142: User packet !!! and eventually this: !!! !!!! Warning !!! !!! Your network interface appears to be running slowly! !!! ping time for empty packet 1e+06us (expect <120) !!! ping time for full packet 1e+06us (expect <300) !!! !!! Things might be slow, check network settings !!! !!! There is no use continuing. This requires action: !!! This is a (semi-)known issue with the Genesys-2 firmware and possibly the network !!! adapter on the PC. If this happens, wait !!! until ITSDAQ shows the BurstData/ScanData windows (don't kill the process manually), !!! but then leave ITSDAQ straight-away with '.q'. Switch off the LV PSU (mains power, !!! or 'VIEW(LOCAL) button' then both channels. !!! Check the network switch sitting on top of the HBI crate: If there is a direct !!! connection, ie yellow cable into Genesys-2, swap this into the switch and use !!! the thin black network cable to connect the Genesys-2. If it's like that, swap !!! the yellow cable into the Genesys-2. This looks weird but it helps. !!! Then switch the LV PSU on again, and both channels, and start ITSDAQ again, !!! the output will then hopefully be 'smooth' again, without the aforementioned errors. !!! Do again AutoConfig(false,false,false) [this my not be needed]. !!! If the problem persists consider rebooting the PC - again. !!! Note: We have no admin rights on this PC, so we can't play with network settings !!! or restart NetworkManager, which may well fix this too. 5) Start the GUI: Open a new terminal (which hasn't had ITSDAQ running before), and do: $ cd /home/itkuser2/burn-in-gui_v14Aug24/hybrid-burnin-gui $ bash run.sh The GUI will then pop-up, in a Chrome browser window. Enter the serial number of the panel at the appropriate place (here: Slot 2), tick the hybrid positions which are occupied ('Available panel'), choose 'Barrel' from the 'Hybrid Flavour' drop-down menu. Enter the serial numbers, and click the boxes for populated hybrid positions (Note: The way we're using the Burnin-GUI at the moment, actually very little of those settings is actually used. We do this mostly to avoid that the GUI complains). Click on the 'Production DB Login' bar and enter your DB login, click on 'Get DB token' to get the token. !!! Untick the 'Generate ITSDAQ Config' (for AutoConfig)' !!! button in the GUI ! With this left enable, the GUI will !!! (likely) overwrite your system config file, and you get _really_ odd error !!! like 'no chipset'. This is recoverable by copying the 'backup' file (see above) !!! back as system config. Then you can hit 'Start' and there should be a terminal window (black-on-white XTerminal) popping up with the ITSDAQ-LTT_AMAC process (which reads the AMAC temperatures, and also does power up/down stetps) and a little later a second XTerminal window with the ITSDAQ-LTT process which runs the actual tests. Check that all the AMACs respond and the power down/up sequence looks fine in the LTT_AMAC window. AMAC temperatures are then listed, along with the fan level. Note that this installation has a fan level minimum of 50 (out of 255), this is a safety precaution. Ie the level will not drop below 50, the fans will not stop. Example output: Setting fans to level 57 AMAC temp. 0(H0, H1, PB): -235.417, -235.417, 34.5277 AMAC temp. 1(H0, H1, PB): -1000, -1000, -1000 AMAC temp. 2(H0, H1, PB): 35.4211, -235.417, 43.4846 AMAC temp. 3(H0, H1, PB): 37.7727, 40.5224, 46.44 [Note: One AMAC is not reponding there: '-1000'. This _may_ not be a problem, they keep powering the hybrids even if they don't respond - usually. They also sometimes come back with the automated restart every 9 hours.] This infrastructure is very similar to the MTC, but of course without all the auxilary devices eg HV. The Arduino of the fan system is controled from the Genesys-2 firmware via a I2C cable (PMOD connection). 6) Monitor the HBI run: The values can be monitored by looking at the Grafana page here, which can be opened from any PP Linux PC: http://eprex5:3000/d/PiQGdfb4k/burnin-crate?orgId=1&var-datasource=lvGtk-OVk&var-bucket=secondburnincrate&from=1723476283000&to=now&refresh=30s (Don't be confused by the label 'secondburnincrate', this is indeed HBI crate #1 still !). Change refresh rate (circle arrows) to 30sec. Change time period to 5 minutes (or as appropriate when HBI has been running for a while): Note: There is an automatic restart of the hybrids every 9h implemented in the GUI configuration. This was meant to help with memory leakage, but clearly hasn't done this fully. This is where the value drops/spikes every 9h are from, so that's all good. 7) Monitor the test results: Login to epldt123 (via eprexa if you don't sit in front of the PC), and go to: $ cd /home/itkuser2/burn-in-gui_v14Aug24/hybrid-burnin-gui/itsdaq-sw/sctvar/ps $ ls -ltr Check there there are new files being written, and have a look if they make sense. The newer of the two 'RCPlot' is the 'Full response curve (10 charges)' which is probably the 'golden result', esp the botton-left plot which is 'Input Noise'. This should be nice and flat and about 400 'ENC'. Don't forget to check esp also p.3 which is the 'other half' of that hybrid. The files look something like this, with the hybrid serial number: SN20USBHX2001089_RCPlot_20240815_115639.pdf SN20USBHY0000365_RCPlot_20240815_115639.pdf or 'JaneDoe*RCPlot*' if the hybrid isn't yet assembled in the DB. 8) Automatic end of HBI The HBI will stop after 100h, and the GUI switches off power to the crate from the LV PSU. So the setup can just be left alone even if nobody is there to stop it after 100h. The fans are then running at full speed (fan level 255). Press the 'Stop' button to stop the GUI. Note: It is not recommended to simply close the GUI (Chrome) windows, they may result in 'Zombie' ITSDAQ processes. Again, rebooting the PC helps there. Be patient when pressing the 'Stop' button. It will take a while for the GUI to end the ITSDAQ processes.