I have been developing an R package {generatervis} which is useful to synthesise and visualise clinical data. This is the first blog in this series of blogs to share about the process from ideation to implementation.
I have been developing an R package {generatervis} which is useful to synthesise and visualise clinical data. This is the first blog in this series of blogs to share about the process from ideation to implementation.
Data management is the process of collecting, storing, organising, and using data in a structured manner to ensure accuracy, completeness, and security. It involves all aspects of managing data throughout its lifecycle, from creation to deletion or archiving. Essentially, it’s about making sure data is accessible, reliable, and used effectively to support operations and decision-making. Data management becomes really difficult to achieve in a scenario where the research data is spread across multiple organisations, is multi-omics (genomics, transcriptomics, proteomics, and metabolomics) in nature, and is being published through different journals. To add more complexity to this, there are different versions/stages of the data files (raw, processed, summarised) throughout the data lifecycle, and ingesting the data into the essential workflows becomes all the more complex.
REsearch Data Management and ANalysis Environment (REDMANE) is a data management and analysis platform that was introduced to resolve these complex issues.
REDMANE resolves these issues by having the following features:
Whole Genome Sequencing (WGS) data refers to the complete DNA sequence of an organism, including both coding and non-coding regions. It’s a comprehensive analysis of an organism’s genome, providing a detailed view of its genetic makeup. WGS is a powerful tool for genomics research, allowing scientists to identify genetic variations, study disease outbreaks, and more.
Some of the use cases for WGS data are:
WGS data in this R package is synthesised, instead of using real-world data so as to ensure that patient privacy is maintained and any potential security concerns are addressed while still making the data useful for research. As an example, sensitive data such as: medicare number, date of birth, location of residence, etc, are often included in clinical data. Therefore, the solution is to artificially or ‘synthetically’ generate clinical data that replicates real-world datasets.
REDMANE contains different types of datasets for the sample patient ID and the sample ID. One of these dataset types is WGS data. In this blog series, we will explore the process of file creation, dataset creation, and workflow creation for the WGS dataset.
The above diagram shows a maturity model for creating WGS data files and uploading them to the cBioportal. cBioportal is a widely used free, open-source online database useful to explore, analyse and visualise cancer genomics data.
Step 1: Create empty files with the correct file names for the workflow.
Step 2: Add minimal sample data to the empty files.
Step 3: Add more sample data to the minimal files.
Some of the potential next steps include
P.S.: The project development is done using CI/CD. To know further, keep an eye on future blog posts in this series.
Email: bhogaljyoti1@gmail.com
LinkedIn: jyoti-bhogal
GitHub: jyoti-bhogal
Mastodon: jyoti_bhogal
Bluesky: jyoti-bhogal.bsky.social
Website: https://jyoti-bhogal.github.io/about-me/index.html
For attribution, please cite this work as
Bhogal (2025, April 17). Home: {generatervis}: An R package to synthesise and visualise clinical data - Part 1. Retrieved from https://jyoti-bhogal.github.io/about-me/
BibTeX citation
@misc{bhogal2025{generatervis}:, author = {Bhogal, Jyoti}, title = {Home: {generatervis}: An R package to synthesise and visualise clinical data - Part 1}, url = {https://jyoti-bhogal.github.io/about-me/}, year = {2025} }