Skip to content

fix: update file size calculation in dataset info#689

Merged
cristian-tamblay merged 3 commits into
developfrom
fix/size-dataset
Jun 15, 2026
Merged

fix: update file size calculation in dataset info#689
cristian-tamblay merged 3 commits into
developfrom
fix/size-dataset

Conversation

@Felipedino

Copy link
Copy Markdown
Collaborator

This pull request updates how dataset file size is calculated and displayed in the dataset visualization component. Instead of reporting the in-memory size, the system now reports the actual file size based on the Arrow table, ensuring more accurate information is shown to users.

Backend: File size calculation update

  • Changed the backend to report file_size_mb using the Arrow table's byte size instead of the DataFrame's memory usage, providing a more accurate file size metric. (DashAI/back/dataloaders/classes/dashai_dataset.py)

Frontend: File size display update

  • Updated the frontend to use the new file_size_mb field for displaying file size in the dataset visualization header, instead of the previous memory_usage_mb field. (DashAI/front/src/components/DatasetVisualization.jsx)

Copilot AI review requested due to automatic review settings June 7, 2026 17:19

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the dataset metadata contract so the UI displays a dataset “size” value sourced from backend metadata computed from the underlying Arrow table rather than a Pandas DataFrame memory-usage estimate.

Changes:

  • Backend: replaces general_info.memory_usage_mb with general_info.file_size_mb computed from arrow_table.nbytes.
  • Frontend: switches the dataset visualization header to read general_info.file_size_mb for display.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
DashAI/front/src/components/DatasetVisualization.jsx Switches the header’s displayed size field from memory_usage_mb to file_size_mb.
DashAI/back/dataloaders/classes/dashai_dataset.py Changes the computed dataset size metadata to use Arrow table byte size and emits it as file_size_mb.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread DashAI/front/src/components/DatasetVisualization.jsx
Comment thread DashAI/back/dataloaders/classes/dashai_dataset.py
@cristian-tamblay

Copy link
Copy Markdown
Member

I find this change a bit weird. Any comments @Irozuku or @Creylay ?

@cristian-tamblay cristian-tamblay added the question Further information is requested label Jun 9, 2026
@Felipedino

Copy link
Copy Markdown
Collaborator Author

It previously used the size of the pandas DataFrame. Now, it retrieves the size from the Arrow file generated in .dashai

Comment thread DashAI/back/dataloaders/classes/dashai_dataset.py Outdated
@Felipedino Felipedino removed the question Further information is requested label Jun 12, 2026
@cristian-tamblay cristian-tamblay merged commit c84b4fd into develop Jun 15, 2026
19 checks passed
@cristian-tamblay cristian-tamblay deleted the fix/size-dataset branch June 15, 2026 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants