Pagination support for commits. by svetlyak40wt · Pull Request #24 · ask/python-github2

svetlyak40wt · 2010-12-09T07:50:26Z

Hi, I added an optional 'page' argument to receive not only last 35 commits, but all others too. Think, such approach could be useful for other list operations too.

nvie · 2010-12-10T06:37:26Z

Great addition! Could you please implement the paging stuff for all API routes that support paging? Having paging for route A, but not for route B might be confusing and nobody exactly knows which API routes have been covered already.

Thanks in advance,
Vincent

svetlyak40wt · 2010-12-10T07:48:23Z

Sure, but I have another idea.

List methods could return not a list but an iterable object just like Django ORM's QuerySet, which supports slicing. And this iterable object will hide internal github's paging implementation from the end user of the python-github2.

For example, they could iterate over all commits using:

for commit in gh.commits.list('django/django'):
    process(commit)

And github2 will make as many requests to the GitHub as required.

Or user coul only fetch first 100 commits:

for commit in gh.commits.list('django/django')[:100]:
    process(commit)

And github2 will do no more than 3 requests to the GitHub behind the scene?

I see only one con against this approach. It is backward incompartible with the previous code and to convert old code which uses github2 to work with new library, you have to use limits and perhaps lists:

If previously you wrote commits = hg.commits.list('django/django') and receive no more than 30 items in the list, then now you will have to write commits = gh.commits.list('django/django')[:LIMIT] or, if you really need a list here: commits = list(gh.commits.list('django/django')[:LIMIT]).

I suggest this implementation because without that, many developers who using github2, will have to implement same logic in their own code. For example, here is my curren implementation of the generator:

def get_commits(*params):
    """ Generator which fetches commits from
        given repository until LIMIT
        will be reached.
    """
    page = 1 
    limit = config.LIMIT

    commits = gh.commits.list(page = page, *params)
    while commits:
        for commit in commits:
            yield commit
            limit -= 1
            if limit == 0:
                raise StopIteration
        page += 1
        commits = gh.commits.list(page = page, *params)

By the way, such generator could be generalized to use with any listable github's object if it will support the page argument in it's list method. And this will allow us to keep backward compatibility. But this approach is a compromise :(

nvie · 2010-12-13T09:15:05Z

I think the addition of an iterator hiding the details of paging is a great fit for this library. The backward incompatibility is a bit tricky, so I'm not too fond of changing the list method. Instead, we could solve this by providing an iter() method instead of list()? Then list() can keep its current semantics and people can use an iterator in the case they prefer that.

By the way: in your code sample you call gh.commits.list('django/django')[:LIMIT], but AFAIK iterators aren't subscriptable. (Subscript semantics on iterators is really hard, since they involve state.) Using islice could do the trick like this, however:

islice(gh.commits.list('django/django'), None, LIMIT)

nvie · 2010-12-13T09:28:05Z

We could even let the iter() method return the islice'd iterator:

def iter(self, start=None, stop=None):
    ...
    return islice(commits, start, stop)

That way, you can simply call:

# iter over all commits
for c in gh.commits.iter():
    ...

# iter over a subset
for c in gh.commits.iter(25, 100):
    ...

svetlyak40wt · 2010-12-13T11:37:50Z

Of cause, you can't apply [:LIMIT] to an iterator, but you can do this with any iterable object. This will look like this for end user:

for c in gh.commits.iter('django/django')[:50]:
    ...

I'll try to write the implementation within few days, and will send you another pull request.

nvie · 2010-12-13T15:27:50Z

Yeah, what I meant is that (lazy) iterables are not subscriptable (at least not up until Python 2.6), so the above code yields a syntax error. See this:

>>> def count():                                     
...     i = 0                                        
...     while True:                                  
...         yield i                                  
...         i += 1                                   
...                                                  
>>> for i in count()[:50]:                           
...     print i                                      
...                                                  
Traceback (most recent call last):                   
  File "<stdin>", line 1, in <module>                
TypeError: 'generator' object is unsubscriptable     
>>>

However, if you wrap it in an islice wrapper, you can subscript these kinds of infinite lists:

>>> from itertools import islice
>>> for i in islice(count(), None, 7):
...     print i
...
0
1
2
3
4
5
6

An alternative is to resolve the iterator's elements, which looses the laziness (and therefore doesn't work with infinite lists).

for c in list(gh.commits.iter())[:50]:
    ....

This would effectively fetch ALL commit objects in GitHub in one big list and then slice the first 50 out of that result.

svetlyak40wt · 2010-12-13T16:19:17Z

Generators like this one are not subscribtable, but this does not mean, that I can't return in immediate object from iter method which will support slicing and iteration. Wait for the patch, ok? :)

nvie · 2010-12-21T22:21:14Z

Hey Alexander, how's your patch progressing?

svetlyak40wt · 2010-12-21T22:30:26Z

Hi. Sorry for delay. It is hard to do this in stable manner, because there is no any documentation from the GitHub and I found no way to tell GitHub the page size.

By default, page size is 35 items. I need to know that this value will never changed or (better) to specify it myself. Without this, I can't to implement a reliable slicing.

If you want, I could try to implement it and hardcode that page's size is 35 items, but it could be broken someday.

nvie · 2010-12-22T20:40:18Z

You can safely use the static page size, technoweenie himself promised me it won't change for the v2 API.

svetlyak40wt · 2011-01-05T00:26:06Z

Ok, I wrote this helper, to solve this task: https://github.com/svetlyak40wt/pagerator
It is generalized and could be used anywhere else, to iterate over any paged results.

For our case, gh.commits.iter() method should return be:

def iter(self, project, branch="master", file=None):
    return pagerator.IterableQuery(
        lambda page: self.get_values(
            "list", project, branch, file,
            filter="commits", datatype=Commit,
            post_data=dict(page=page+1)
        ),  
        35 # page_size hardcoded in the GitHub
    )

But I can't test it right now, because I receive "api route not recognized" error from the GitHub, even for list method which worked previously. Do you know how to fix it?

… over pages.

twidi · 2011-10-17T19:24:10Z

@ask any news on this ? For the first time i have now pages on my project (not only on commits, but every list) and with the current code i can't access them.

And i don't know if you want the iter system to be integrated the way described here.

twidi · 2011-10-17T21:05:18Z

i made this : #66

JNRowe · 2012-06-18T15:09:12Z

Given the API v2 switchoff these bugs aren't going to be worked on / fixed.

twidi · 2012-06-18T15:25:14Z

Yes of course :)

Paging support for commit list using optional 'page' argument.

4157bd9

svetlyak40wt added 2 commits January 4, 2011 14:20

Paging support for commit list using optional 'page' argument.

94e25da

Merge branch 'master' of github.com:svetlyak40wt/python-github2

7c9680b

Added method 'iter' for commits. It uses pagerator library to iterate…

f337f3c

… over pages.

JNRowe closed this Jun 18, 2012

Conversation

svetlyak40wt commented Dec 9, 2010

Uh oh!

nvie commented Dec 10, 2010

Uh oh!

svetlyak40wt commented Dec 10, 2010

Uh oh!

nvie commented Dec 13, 2010

Uh oh!

nvie commented Dec 13, 2010

Uh oh!

svetlyak40wt commented Dec 13, 2010

Uh oh!

nvie commented Dec 13, 2010

Uh oh!

svetlyak40wt commented Dec 13, 2010

Uh oh!

nvie commented Dec 21, 2010

Uh oh!

svetlyak40wt commented Dec 21, 2010

Uh oh!

nvie commented Dec 22, 2010

Uh oh!

svetlyak40wt commented Jan 5, 2011

Uh oh!

twidi commented Oct 17, 2011

Uh oh!

twidi commented Oct 17, 2011

Uh oh!

JNRowe commented Jun 18, 2012

Uh oh!

twidi commented Jun 18, 2012

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants