Hub v2.3.4 features
We added support for the most common image formats in hub.ingest
and hub.ingest_kaggle
(so you can directly ingest popular datasets from Kaggle). Also, we introduced ds.summary()
so you can easily understand your dataset layout. See what’s included in the screenshot!
Now you can return data in PyTorch Dataloaders as bytes instead of tensors, using ds.pytorch(... tobytes = True)
. This enables you to use libraries of your choice to decompress and remove your data. We also shipped less intrusive locking when performing operations on different version control branches.
Community contributions
New datasets were uploaded and documented by our community members Uday Uppal (KKanji) and Manas Gupta (EMNIST). We also changed str return
to include Tensor-Wise information which was for issue by Suhaas Neel.
Additionally, Bikram Maharjan added support to additional image formats in hub.auto
. Sai Nikhilesh Reddy has contributed to ds.summary
and the ReadMe in Chinese was merged by Jinyi Chen.
Events
We’re presenting at PyCon next week (Apr 27 - May 1)! Stop by our booth if you’re around or register for Davit Buniatyan's workshop The Future of Handing off Data to Compute.
Community feature
Gradient Health, our partner company (Ouwen Huang) has published a great guide to Open-source Medical Imagery Datasets check it out and let us know what you think.