Changing csv_reader_py impl to return df from objmode #918

kozlov-alexey · 2020-08-30T19:50:24Z

Motivation: returning Tuple of columns read from csv file with
pyarrow csv reader from objmode and further calling init_dataframe
ctor to create native DF turned out to be inefficient in sense of
LLVM IR size and compilation time. With this PR we now rely on DF
unboxing and return py DF from objmode.

Compile time of read_csv + df.count():

solutions\columns	4	8	16	32	64	128	256
Numba master + both SDC fixes (`2b8b003`)	8.897234	9.306839	10.54691	12.52175	17.41399	30.47878	65.63396
Numba master + SDC fix #1 (`964e498`)	9.283413	9.83861	13.30219	21.7165	53.07618	187.4615	1026.31
Numba 0.50.1 + SDC master	9.212505	10.238	14.08183	25.16768	72.9872	290.3359	2141.832
Ratio (both fixes to master)	1.035435	1.100051	1.335162	2.009917	4.191296	9.525835	32.63299

pep8speaks · 2020-08-30T19:50:29Z

Hello @kozlov-alexey! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-07 22:18:36 UTC

Motivation: returning Tuple of columns read from csv file with pyarrow csv reader from objmode and further calling init_dataframe ctor to create native DF turned out to be inneficient in sense of LLVM IR size and compilation time. With this PR we now rely on DF unboxing and return py DF from objmode.

sdc/tests/test_io.py

kozlov-alexey · 2020-09-07T15:44:17Z

sdc/io/csv_ext.py

+ tuple(col_typs),
+ types.none,
+ tuple(col_names),
+ column_loc=column_loc


This implies has_parent=False (so we won't make reflection of changes made to native DF at boxing to python object), but it turns out that we don't support reflection for DFs at all. @AlexanderKalistratov, do you think it's a problem in this context (i.e. we should probably support reflection first)?

sdc/datatypes/hpat_pandas_functions.py

This reverts commit 811e9f0.

…#932) This reverts commit 811e9f0.

This reverts commit 30122b2.

kozlov-alexey force-pushed the feature/reduce_read_csv_ir_size_1 branch from a275fc6 to 964e498 Compare August 30, 2020 20:02

kozlov-alexey requested review from AlexanderKalistratov and Hardcode84 August 31, 2020 18:39

kozlov-alexey added the Ready for Review label Aug 31, 2020

Hardcode84 reviewed Aug 31, 2020

View reviewed changes

sdc/tests/test_io.py Show resolved Hide resolved

kozlov-alexey added 2 commits September 7, 2020 02:11

Merge branch 'master' into feature/reduce_read_csv_ir_size_1

430136d

Capture dtype dict instead of building in objmode

2b8b003

kozlov-alexey force-pushed the feature/reduce_read_csv_ir_size_1 branch from 87fdabe to 2b8b003 Compare September 7, 2020 00:14

kozlov-alexey commented Sep 7, 2020

View reviewed changes

sdc/datatypes/hpat_pandas_functions.py Show resolved Hide resolved

Applying comments #1

a8fe781

AlexanderKalistratov approved these changes Sep 9, 2020

View reviewed changes

AlexanderKalistratov merged commit 811e9f0 into IntelPython:master Sep 9, 2020

Hardcode84 added a commit that referenced this pull request Sep 30, 2020

Revert "Changing csv_reader_py impl to return df from objmode (#918)"

dc6b484

This reverts commit 811e9f0.

Hardcode84 mentioned this pull request Sep 30, 2020

Revert "Changing csv_reader_py impl to return df from objmode" #932

Merged

Hardcode84 pushed a commit that referenced this pull request Sep 30, 2020

Revert "Changing csv_reader_py impl to return df from objmode (#918)" (…

30122b2

…#932) This reverts commit 811e9f0.

kozlov-alexey mentioned this pull request Oct 19, 2020

Implements init_dataframe as multiple codegen functions #936

Merged

kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Nov 3, 2020

Return back csv_reader_py changes from IntelPython#918

f8e775d

This reverts commit 30122b2.

kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Nov 6, 2020

Return back csv_reader_py changes from IntelPython#918

943da80

This reverts commit 30122b2.

kozlov-alexey added a commit to kozlov-alexey/sdc that referenced this pull request Nov 9, 2020

Return back csv_reader_py changes from IntelPython#918

812e694

This reverts commit 30122b2.

kozlov-alexey added a commit that referenced this pull request Nov 10, 2020

Return back csv_reader_py changes from #918 (#943)

a39d73d

This reverts commit 30122b2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changing csv_reader_py impl to return df from objmode #918

Changing csv_reader_py impl to return df from objmode #918

Uh oh!

kozlov-alexey commented Aug 30, 2020 •

edited

Loading

Uh oh!

pep8speaks commented Aug 30, 2020 •

edited

Loading

Uh oh!

Uh oh!

kozlov-alexey Sep 7, 2020

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Changing csv_reader_py impl to return df from objmode #918

Changing csv_reader_py impl to return df from objmode #918

Uh oh!

Conversation

kozlov-alexey commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented Aug 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2020-09-07 22:18:36 UTC

Uh oh!

Uh oh!

kozlov-alexey Sep 7, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kozlov-alexey commented Aug 30, 2020 •

edited

Loading

pep8speaks commented Aug 30, 2020 •

edited

Loading