Architecture
The proposed scripting system SpinSPJ offers a flexible scripting environment to implement custom functionality for NMR users. SpinSPJ works as a plug-in in SpinStudioJ, which is a plug-in based NMR software. SpinSPJ interacts with other plug-ins with the mechanism defined by framework OSGi (Open Service Gateway Initiative) [30]. The relationship between SpinSPJ and SpinStudioJ is illustrated in Fig. 1.
The conventional instrument control and data processing capabilities are implemented in Java. CPython has advantages in the availability of advanced libraries for numerical computation and artificial intelligence (e.g., NumPy [31], TensorFlow [32]). The significant issue of the proposed scripting system focuses on how to build a bridge that connects Java and CPython so that both Java-based NMR methods and third-party CPython libraries are supported. CPython, developed with the C programming language, supports an extension mechanism to wrap C libraries as customized modules. The Java virtual machine provides a mechanism called the Java Native Interface (JNI) [33] to support interactions with the C programming language. Through the JNI, Java can call functions defined by C, and C also can access resources (e.g., classes, functions, objects) in the Java environment. Therefore, the C programming language can serve as an ideal bridge between the Java and CPython languages.
The overall architecture of SpinSPJ is illustrated in Fig. 2. According to the computer languages adopted, the entire scripting system consists of three components: Java, C, and CPython.
The Java component is responsible for the graphical user interface and provides the interfaces for CPython including the basic configuration, scripting editor, instrument control, and data processing. The basic configuration can set the location of CPython libraries. The scripting editor offers a script editing window, execution output, menus, and toolbar. The interfaces for instrument control and data processing are implemented by the OSGi which separates the abstract interface from the concrete business logic. The instrument control includes sample control, temperature control, tuning, locking, shimming, data acquisition, etc. The data processing includes Fourier transform, phase correction, baseline correction, peak picking, and integration functionality. The NMR data of a HDF5 [29, 34] based custom format organises parameters, pulse sequence, free induction decay (FID), spectrum, peak list as a hierarchical style in a single file. It can be easily read and converted to other data formats by third party analysis tools (e.g., Matlab, CPython). After processing operations selected by NMR users are performed by CPython scripts, the processed data can be written back and saved to the disk. The scripts can be set to both blocking and non-blocking modes to ensure the statements are executed in the expected sequence.
The C component is the bridge between the Java and CPython components. Through the JNI, Java can call functions in C based libraries. If necessary, the C component also can create Java objects and call Java functions. Based on the C extension mechanism of CPython, the C component can define customized modules for the CPython component. In the CPython environment, customized modules can be compiled and linked with basic CPython libraries. Two significant methods used to accomplish these are the SetPythonHome method (set the location of the CPython libraries) and the PyRun_String (execute scripted code). A Java object can be wrapped as a PyObject, which is the foundation of the CPython language. As Java based resources can be accessed by CPython, the environments of Java and CPython are connected by the C component. In SpinSPJ, the C component creates a C extension module of CPython and builds the Java-CPython bridge with the help of an open source library called Jep [35].
The CPython component can define customized initialization and import methods for packages and modules, as well as offering various native libraries (e.g., NumPy, SciPy, TensorFlow). The initialization and import are the significant steps which enable the interactions between CPython and Java. For native libraries, NumPy and SciPy are usually used for fast numerical operations and scientific calculations. TensorFlow is widely used for deep learning. Non-uniform sampling and chemical shift prediction methods developed by deep learning can be easily integrated into the scripting system.
Workflow
The workflow of SpinSPJ explains how the internal components work from the perspective of a time series. It contains the sequential actions and interactions of the components Java, C, and CPython during different stages. The workflow consists of three main stages: initialization, execution, and exiting.
The initialization stage is mainly for the preparation of the scripting environment. In this stage, the scripting system first configures the location of the CPython libraries. Secondly, CPython installs an importer hook and inserts it into the sys.meta_path. The importer hook defines the methods find_module and load_module to tell CPython how to find and load Java packages. Therefore, the resources of Java and CPython have been connected and the scripting environment has been established in this stage.
In the execution stage, the scripting system controls the three components to support grammar features and resource accessibility. The workflow of the stage is illustrated in Fig. 3. There are two significant issues in this stage: import and interpretation. Different from conventional CPython environments, import statements in SpinSPJ can be used to import Java packages. When an import statement is called, the scripting system searches for the expected package from the variable sys.modules. If the package is found, it indicates that the package has been loaded by CPython; otherwise, the CPython environment finds the importer hook to invoke the find_module and load_module methods to load the spinspj module. The spinspj module is used to interact with Java resources. The spinspj module has a method __getattr__ to define its submodules for packages and classes in Java environment. The methods (wrapped to PyJMethod) and fields (wrapped to PyJField) of Java objects are wrapped as the attributes of a CPython object. CPython allows the PyJMethod to implement custom execution by defining the attribute tp_call of PyTypeObject, and allows PyJField to implement custom getting and setting styles by defining the attributes tp_getattro and tp_setattro of the PyTypeObject. An example is illustrated in Fig. 3. When the NMR command go is executed in scripts, PyJMethod invokes corresponding Java method by the JNI. Therefore, the CPython interpreter can recognize Java objects as conventional native CPython objects, as well as calling Java methods freely.
In the exiting stage, exception and memory management are the significant issues to ensure the scripting system is stable and robust. The JNI allows the C component to throw C based exceptions to the Java component. The Java component catches exceptions and back traces. For memory management, the Java component can reclaim memory at runtime by automatically leveraging the garbage collection feature of the Java virtual machine, so there is no need to release memory manually. However, the C component must release memory manually. Both the JNI and the C extension mechanism offer corresponding methods to release memory in order to avoid memory leaks.