Skip to content
This repository was archived by the owner on Feb 2, 2024. It is now read-only.

Conversation

@kozlov-alexey
Copy link
Contributor

@kozlov-alexey kozlov-alexey commented Aug 30, 2020

Motivation: returning Tuple of columns read from csv file with
pyarrow csv reader from objmode and further calling init_dataframe
ctor to create native DF turned out to be inefficient in sense of
LLVM IR size and compilation time. With this PR we now rely on DF
unboxing and return py DF from objmode.

Compile time of read_csv + df.count():

solutions\columns 4 8 16 32 64 128 256
Numba master + both SDC fixes (2b8b003) 8.897234 9.306839 10.54691 12.52175 17.41399 30.47878 65.63396
Numba master + SDC fix #1 (964e498) 9.283413 9.83861 13.30219 21.7165 53.07618 187.4615 1026.31
Numba 0.50.1  + SDC master 9.212505 10.238 14.08183 25.16768 72.9872 290.3359 2141.832
Ratio (both fixes to master) 1.035435 1.100051 1.335162 2.009917 4.191296 9.525835 32.63299

@pep8speaks
Copy link

pep8speaks commented Aug 30, 2020

Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-07 22:18:36 UTC

Motivation: returning Tuple of columns read from csv file with
pyarrow csv reader from objmode and further calling init_dataframe
ctor to create native DF turned out to be inneficient in sense of
LLVM IR size and compilation time. With this PR we now rely on DF
unboxing and return py DF from objmode.
@kozlov-alexey kozlov-alexey force-pushed the feature/reduce_read_csv_ir_size_1 branch from 87fdabe to 2b8b003 Compare September 7, 2020 00:14
tuple(col_typs),
types.none,
tuple(col_names),
column_loc=column_loc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies has_parent=False (so we won't make reflection of changes made to native DF at boxing to python object), but it turns out that we don't support reflection for DFs at all. @AlexanderKalistratov, do you think it's a problem in this context (i.e. we should probably support reflection first)?

@AlexanderKalistratov AlexanderKalistratov merged commit 811e9f0 into IntelPython:master Sep 9, 2020
Hardcode84 added a commit that referenced this pull request Sep 30, 2020
Hardcode84 pushed a commit that referenced this pull request Sep 30, 2020
kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Nov 3, 2020
kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Nov 6, 2020
kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Nov 9, 2020
kozlov-alexey added a commit that referenced this pull request Nov 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logistic regression example not working

4 participants