Skip to content

Webapp#78

Open
MikeLippincott wants to merge 9 commits into
WayScience:mainfrom
MikeLippincott:webapp
Open

Webapp#78
MikeLippincott wants to merge 9 commits into
WayScience:mainfrom
MikeLippincott:webapp

Conversation

@MikeLippincott

Copy link
Copy Markdown
Member

This PR is for the webapp in development for Jacey's first author manuscript.

@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@MikeLippincott MikeLippincott requested review from gwaybio and jaceybronte and removed request for gwaybio June 9, 2026 02:52

@gwaybio gwaybio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few, mostly housekeeping comments. @jaceybronte should be the one to give final approval.

Is it possible to make the app run on the github.io pages for this repo (rather than the streamlit.app)?

Also, the app link you sent should be specified in the readme (and it is currently down)

# - `CRISPRGeneEffect.parquet`: The data in this document are the Gene Effect Scores obtained from CRISPR knockout screens conducted by the Broad Institute. Negative scores notate that cell growth inhibition and/or death occurred following a gene knockout. Information on how these Gene Effect Scores were determined can be found [here](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02540-7)
# - `depmap_gene_meta.tsv`: Genes that passed QC and were included in the training model for Pan et al. 2022. We use this data to filter genes as input to our models. The genes were filtered based 1) variance, 2) perturbation confidence, and 3) high on target predictions based on high correlation across other guides.
#
# > Pan J, Kwon JJ, Talamas JA, Borah AA, Vazquez F, Boehm JS, Tsherniak A, Zitnik M, McFarland JM, Hahn WC. Sparse dictionary learning recovers pleiotropy from human cell fitness screens. Cell Syst. 2022 Apr 20;13(4):286-303.e10. doi: 10.1016/j.cels.2021.12.005. Epub 2022 Jan 31. PMID: 35085500; PMCID: PMC9035054.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the webster citation and probably should be kept (see line 11)


# Load depmap metadata
gene_meta_df = pd.read_parquet(qc_gene_file, sep="\t")
gene_meta_df = pd.read_csv(qc_gene_file, sep="\t")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would this change to a read_csv?

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm concerned with this approach, mostly because i don't understand what will happen when someone uses the webapp. If someone uses the webapp, does it trigger a gitlfs pull? It must retrieve the data from somewhere. Rather than triggering this (which consumes git lfs tokens $$), please use a different approach, of which, there are a few (e.g., figshare)

Comment thread 9.webapp/requirements.txt
@@ -0,0 +1,88 @@
altair==6.1.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dependency list is extremely strict and fragile - almost certainly will break very soon. What is the convention for streamlit? Is it possible to relax restrictions? How is streamlit tested? Consider digging into this a bit more, which will likely increase longevity of the app

Comment thread .pre-commit-config.yaml
name: isort (python)
args: ["--profile", "black", "--filter-files"]

#Code formatter for both python files and jupyter notebooks

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider running precommit on this file (and all other files in this repo) as well

Comment thread README.md
| [0.data-download](0.data-download/) | Download required files | Download gene effect data and cell line information, and download gene QC and construct gene filtering dictionary |
| [1.data-exploration](1.data-exploration/) | Explore and visualize data | Create figures to visualize cell line information and split gene effect data into balanced test and train dataframes |
| [2.train-VAE](2.train-VAE/) | Train Beta VAE and Beta TC VAE models | Optimize hyperparameters and train Beta Variational Autoencoder/Beta Total Correlation Variational Autoencoder with optimal hyperparameters and previously created test and train dataframes |
| [3.analysis](3.analysis/) | Analyze Beta VAE and Beta TC VAE Outputs | Generate heatmaps to visualize death windows by cell line and by genes, run Gene Set Enrichment Analysis with BVAE and BTCVAE synthesized data, and analyze extracted BVAE/BTCVAE latent space data to compare similarity of cancer between different demographics |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment likely for @jaceybronte - please update the README to mirror what we've now done. This is out of date

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will make a PR!

@jaceybronte jaceybronte left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, still trying to figure out how to make it work with all this data. LMK if you need me to edit the parquets

Comment thread README.md
| [0.data-download](0.data-download/) | Download required files | Download gene effect data and cell line information, and download gene QC and construct gene filtering dictionary |
| [1.data-exploration](1.data-exploration/) | Explore and visualize data | Create figures to visualize cell line information and split gene effect data into balanced test and train dataframes |
| [2.train-VAE](2.train-VAE/) | Train Beta VAE and Beta TC VAE models | Optimize hyperparameters and train Beta Variational Autoencoder/Beta Total Correlation Variational Autoencoder with optimal hyperparameters and previously created test and train dataframes |
| [3.analysis](3.analysis/) | Analyze Beta VAE and Beta TC VAE Outputs | Generate heatmaps to visualize death windows by cell line and by genes, run Gene Set Enrichment Analysis with BVAE and BTCVAE synthesized data, and analyze extracted BVAE/BTCVAE latent space data to compare similarity of cancer between different demographics |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, will make a PR!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to run into conflicts when I merge my PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants