The code was written in the C# programming language as a Windows Desktop application. The code is open source under the Gnu Public License v3 [14] and freely available on GitHub [15]. Users can download the source code to compile it or download and run an available executable.
The AtlasGrabber’s intended use is to facilitate the analysis of the protein expression in the HPA from a set of genes. It does so by displaying the images from the HPA in an organized, systematic way, based on a gene list, and allows the saving of genes of interest into new subsets. It is possible to simultaneously analyze a set of predefined genes in up to four different tissue types in normal or cancerous tissues. The gene set analyzed may contain thousands of genes and allows the comparison between stainings with the same antibody in different tissues. An additional feature is the XML parser, which can extract all the gene names, antibodies, and images for a particular tissue from the XML file provided on the HPA website.
An initial text file (.txt) that contains a list of Ensembl IDs for the genes the investigator intends to analyze is needed to start using the AtlasGrabber. Such lists can also be generated from the downloadable files from the HPA website (http://www.proteinatlas.org/about/download) or by searching keywords in the HPA search field and exporting the file. Detailed step-by-step video instructions can be found in the Readme file on GitHub [15].
The software executable can be downloaded directly [15] or compiled from the source code. No additional setup or installations are required. The software has been tested to run on Windows 8, 10, and 11. We recommend using a high-definition, large-screen monitor (above 20 inches) for the best experience as the software will maximize the usage of the screen area by recalculating the area occupied by each window depending on the screen area.
The program uses three different windows: settings, browsers, and analysis windows (Fig. 2A). Initially, the program opens to the “Settings” window (Fig. 2). Here one can load the gene list from the text file (Fig. 2B). Additional options include the possibility to specify the analysis to all antibodies or to separate commercial and in-house ones, to look at all the image samples, just one, or a random one, and to filter away additional images from the same patient sample for one antibody (typically there are two images per patient sample) (Fig. 2C). In this window, it is also possible to name different lists for the storage of selected genes (Fig. 2D). Each list is assigned a key: from 0 to 9. While in the Analysis window, looking through the atlas, the current gene ID is copied to any of the ten lists with the assigned key. If saved, the list will appear in the same folder where the program is located. If the file already exists from a previous analysis, the new gene names will be added to the old ones in case one chooses to do the analysis in multiple runs.
The tissues to be analyzed are selected at the top of the screen. One can choose any normal or cancer tissue from the dropdown menu in any of the four menu windows. When a new window is assigned to a tissue, this new window will be added in the Analysis view (Fig. 3).
To start loading and viewing images, the user selects the “Analysis” window (Fig. 3). Images will be displayed for each antibody in the gene list. The mouse enables the spanning of the image. We recommend using the assigned keyboard keys to move through the images, antibodies, and proteins (Fig. 1B). The scrolling wheel can also be used to move through the images. Pressing any of the keys 0–9 will assign the gene ID to be saved to that list. Returning to the “Settings” window, one can see which gene (ID) is currently being analyzed in the left panel and which gene IDs have been assigned to the different lists.
The “Browsers” window will display the HPA website of the particular antibody in a web browser. This window can be used to read a quick summary about the gene or the antibody. For example, if during the analysis the user identifies an interesting antibody candidate, they can quickly access the HPA information on the antibody, (e.g. antibody provider, antibody validation) and protein summary e.g. names, alternative names, description, intracellular location etc.). This window can also be used as a debug mode. The progress bar at the top of the screen will show the progress of the analysis (Fig. 3B), and the exact gene number from the list is displayed in the application’s title (Fig. 3A). The Help button links to the Readme file on the GitHub page, where more detailed instructions are available, including tutorials with short clips.
The XML parser is available in the Settings window. It can be used to parse the XML database file from the HPA website (Fig. 2E). Its unzipped format can be loaded and subsets of the data, based on normal or cancer tissues, can be extracted into an easily readable.cvs format that will contain all the available gene IDs, gene names, antibodies and online image locations. The file will be automatically saved to the same folder as the application.
To demonstrate the use of the application, we set out to identify new and additional immunohistochemical biomarkers for the basal cells of prostate glands. This cell type surrounds the glands in normal tissues but typically disappears in prostate cancer. In histopathology, three markers are routinely used to identify prostate basal cells: CK14, CK5, and P63. Pathologists use these markers to help diagnose prostate cancer as their absence indicates invasiveness [16,17,18,19].
Using the XML parser functionality, we downloaded the list of Ensembl IDs for normal prostate tissue. As the scope was to demonstrate the usefulness of the software, we selected a subset of the data from the gene list to analyze. We loaded the list into the AtlasGrabber, selected normal prostate tissue to analyze, and went through the list, saving the genes that showed the staining pattern of basal cells. Again using the AtlasGrabber, we compared with their expression in prostate cancer.