Details —TypeinternshipOrganizationUW CENPARoleSoftware Engineer Intern, ResearcherTimelineOCT 2022 —— JUNE 2023 Experiment CENPA
HISTOGRAM VIEWER
PIONEERSubatomic Particle Physics Experiment
C++
·
Python
·
Streamlit
·
Plotly
·
ROOT
Andrew Zhou
·
David Hertzog
·
Omar Beesley
·
Patrick Schwendimann
·
Yuchen Xin

Overview
PIONEER is a subatomic particle physics experiment led by David Hertzog at the University of Washington.
Role
I was one of two interns at the lab. Work was split up such that we would work on separate tasks, but cross-team collaboration and discussion were frequently engaged.

My work revolved around creating data analytics and build tools for our simulation software, based on CERN's Geant4.
Impact
I mainly worked independently on my tasks, with occasional support from my mentor. The list below describes the major contributions I made:
  1. Geometry Build Tool
  2. Optimizing Hackman Openings
  3. Data Analysis Dashboard

# Case Study 1
Geometry Build Tool
Overview —

Physical (real life) testing of the experimental device was volatile and expensive, so Geant4 was used to simulate the effectiveness of different device setups.


An existing build tool would take various configuration files (either a geometry or material configuration file), outputting a valid geometry file for which Geant4 can use to run simulations with.

Problem —

The problem with the existing build tool was that it wasn't portable.


A set of example configuration files was provided in the repository — which was fine, except that the build tool could only read from that example directory.


This meant that in order to experiment with new geometries (a very common occurance), researchers would have to insert new configuration files directly into the main git repo, polutting the main branch and interfering with the work of others.

Question Marks

This caused an awful headache for the many researchers working on the experiment at the same time, since the build tool had to be modified manually to load any new config files, creating version control issues as a side effect. This necessitated an overhaul of the build tool.

Process —
1. Laying Out Requirements

Before jumping straight into programming the new build tool, I used the issues of the old system to map a set of basic requirements for the new build tool:

  • New configuration files must be placed in examples folder -> Configuration files should be able to be read from any user-defined paths or directories.
  • User defined configuration files (i.e. not example files) are included in version control -> Only example configuration files should be allowed in the main branch.
  • Geometry configuration files and material configuration files couldn't be differentiated. -> Differentiate the two types of files so that multiple variants of each file can be interchanged with another.
  • Config files were hard to modify -> There should be an interactive cli to make using the build tool easier.

Together, these changes would help to ensure that the new build tool is portable and will work on any machine.


2. Picking the Right Technologies
Python Logo

The next step was to determine the language(s) and/or libraries needed to create the new build tool. I made a couple observations:

  1. The language must be able to run Python scripts — the existing build interface interacts with various custom python APIs.
  2. New technologies should not require major refactoring or installation across the organization.
  3. The new build tool must be built in a language that many researchers are familiar with.

Provided the observations, Python became the clear winner as the base language for the new build tool. Most researchers already had Python 2 and 3 from pre-existing scientific libraries, and as such were familiar with it as well.

Solution —

I implemented an interactive CLI leveraging InquirerPy. Its featureset includes the basic features as outlined during my development process, as well as other quality of life additions:

  • CListItem to run the build tool from any working directory
  • Customizable output file names
  • Persisting storage for configuration files
  • Documentation for the new API

Challenges

Many foreseen and unforeseen problems arose during development.

  • Python version matching — mismatches between ROOT builds and Geant4 builds forced the build tool to be Python >= 3.9. While the build tool still functioned correctly, ensuring libraries were compatible with later or older versions of Python became a routine step.
  • Batch processing — many calls to the build tool run on a compute server with no human interaction. As such, I had to implement a "batch" mode in which the interactive GUI is disabled, and the build tool is run directly with CLI arguments.
Conclusion —

My build tool was integrated into the main project, and is now used by researchers across the world working on PIONEER. To this day, my API is extensible enough to accomodate rapidly evolving configuration files, add-ons, and more, while being simple and easy to use.


Lessons Learned
  • A deeper understanding of robust and extensible API design
  • Python, its advantages, and its limitations
  • Quick check-in sessions with mentor can prevent unnecessary development and boost productivity

# Case Study 2
Optimizing Hackman Openings
Overview —

I created a tool to find the optimal opening angle of a potential calorimeter setup, by comparing multiple samples' accuracy against their cost.

Problem —
Calorimeter Diagram

Calorimeters are a crucial part of the PIONEER experiment. They record energy output from scattered particles, the result of which is analyzed for how well it adheres to currently established physics. As such, the accuracy of calorimeters is imperative to the success of the experiment.


Yet, the calorimeter is extremely expensive, meaning that there is a tradeoff to be made; accuracy for cost. I wrote a Jupyter notebook graphing the accuracy of a calorimeter against its opening angle to help researchers determine which angle would provide the best "bang for your buck" from simulated data using Geant4.

Process —
1. Laying Out Requirements

Since this project was more ambiguous / flexible than Case Study 1, I had to take some time to clarify what features had to be included.


Goal: Be able to compare multiple angle openings against their accuracies, to determine which angle is "accurate enough."

Note

Wait — What does the "accuracy" of an angle opening even mean?


In this case, accuracy informally refers to the visual spread of a 2D histogram, as will be shown later.

From this, I constructed a basic set of requirements for the program:

  • Speed — while it isn't necessary for this program to run in realtime, it shouldn't take hours either.
  • Reasonable Accuracy — since this program is meant to test general accuracy, scientists won't refer to statistical methods, but rather judging by eye. Therefore, the program should be accurate, but not to a high enough degree as to sacrifice speed.
  • Modular — the program should be easy to modify as many factors will change, such as the data representation method, plot axes, etc.

2. Picking the Right Technologies
Jupyter Notebook

There were a few options for how I could go about programming this tool:

  • Write a C++ or Python program, both of which have libraries which support serializing ROOT files (i.e. simulation data).
  • Create a webapp leveraging JSROOT and data analysis libraries such as D3, MathJS, Etc.
  • Build a Jupyter Notebook

For many reasons, building a Jupyter Notebook was the clear choice. It's modular, while maintaining the performance of Python. Moreover, Python's mature and extensive collection of data analysis tools and libraries would make it faster and easier to develop the tool.

Solution —
Working Smarter, Not Harder

Before creating the data analysis tool, I needed to gather simulation data.


I had access to a few sample ROOT files, but not enough to do meaningful analysis with. Furthermore, simulation settings would constantly change — such as to increase the number of events (i.e. increase precision), change angle openings, or the energy of an initial particle launch.


As such, in addition to the Jupyter Notebook, I developed an internal management tool for customizing/gathering simulation data. Its features include:

  • Multiprocessing, speeding up batch simulations by minutes
  • JSON-based configuration files, allowing me to quickly change simulation settings without modifying my main python scripts
  • Automated naming, storing and caching of datasets to be used by the Jupyter Notebook.

This internal tool allowed me to automatically generate hundreds of datasets with different angles/configurations in an organized fashion to be analyzed later, saving an immense amount of time.


The Jupyter Notebook

The final Jupyter Notebook built on the internal tool to display the data collected. Kernels bin the data in multiple 2D histograms (energy vs. theta), then save the resulting plots as images.

2D Histogram

The projections of those histograms is then saved as images as well.

Projection

Lastly, several projection plots representing a wide range of angle variety is overlayed and saved as well to illustrate general trends in energy as angle changes.

Overlay

It's worth noting that multiprocessing was heavily used to process hundreds of images at once, which was possible due to the I/O limiting nature of reading the data files.


Challenges
During development, unforeseen problems arose for both the internal tool and the Jupyter notebook alike.
  • IPython Limitations — the multiprocessing module of Python's standard library wasn't compatible with Jupyter Notebooks, so I had to find a workaround. Solved by installing a third-party library utilizing pickle rather than json.
  • Databases took lengthy periods to read and plot. Solved by splitting code into different cells so that state is preserved.
Conclusion —

# Case Study 3
Data Analysis Dashboard
Overview —
Problem —
Process —
Solution —
Conclusion —
Copyright © 2024 Andrew Zhou. All Rights Reserved.