Skip to content

Pagination support for commits.#24

Closed
svetlyak40wt wants to merge 4 commits intoask:masterfrom
svetlyak40wt:master
Closed

Pagination support for commits.#24
svetlyak40wt wants to merge 4 commits intoask:masterfrom
svetlyak40wt:master

Conversation

@svetlyak40wt
Copy link

Hi, I added an optional 'page' argument to receive not only last 35 commits, but all others too. Think, such approach could be useful for other list operations too.

@nvie
Copy link
Contributor

nvie commented Dec 10, 2010

Great addition! Could you please implement the paging stuff for all API routes that support paging? Having paging for route A, but not for route B might be confusing and nobody exactly knows which API routes have been covered already.

Thanks in advance,
Vincent

@svetlyak40wt
Copy link
Author

Sure, but I have another idea.

List methods could return not a list but an iterable object just like Django ORM's QuerySet, which supports slicing. And this iterable object will hide internal github's paging implementation from the end user of the python-github2.

For example, they could iterate over all commits using:

for commit in gh.commits.list('django/django'):
    process(commit)

And github2 will make as many requests to the GitHub as required.

Or user coul only fetch first 100 commits:

for commit in gh.commits.list('django/django')[:100]:
    process(commit)

And github2 will do no more than 3 requests to the GitHub behind the scene?

I see only one con against this approach. It is backward incompartible with the previous code and to convert old code which uses github2 to work with new library, you have to use limits and perhaps lists:

If previously you wrote commits = hg.commits.list('django/django') and receive no more than 30 items in the list, then now you will have to write commits = gh.commits.list('django/django')[:LIMIT] or, if you really need a list here: commits = list(gh.commits.list('django/django')[:LIMIT]).

I suggest this implementation because without that, many developers who using github2, will have to implement same logic in their own code. For example, here is my curren implementation of the generator:

def get_commits(*params):
    """ Generator which fetches commits from
        given repository until LIMIT
        will be reached.
    """
    page = 1 
    limit = config.LIMIT

    commits = gh.commits.list(page = page, *params)
    while commits:
        for commit in commits:
            yield commit
            limit -= 1
            if limit == 0:
                raise StopIteration
        page += 1
        commits = gh.commits.list(page = page, *params)

By the way, such generator could be generalized to use with any listable github's object if it will support the page argument in it's list method. And this will allow us to keep backward compatibility. But this approach is a compromise :(

@nvie
Copy link
Contributor

nvie commented Dec 13, 2010

I think the addition of an iterator hiding the details of paging is a great fit for this library. The backward incompatibility is a bit tricky, so I'm not too fond of changing the list method. Instead, we could solve this by providing an iter() method instead of list()? Then list() can keep its current semantics and people can use an iterator in the case they prefer that.

By the way: in your code sample you call gh.commits.list('django/django')[:LIMIT], but AFAIK iterators aren't subscriptable. (Subscript semantics on iterators is really hard, since they involve state.) Using islice could do the trick like this, however:

islice(gh.commits.list('django/django'), None, LIMIT)

@nvie
Copy link
Contributor

nvie commented Dec 13, 2010

We could even let the iter() method return the islice'd iterator:

def iter(self, start=None, stop=None):
    ...
    return islice(commits, start, stop)

That way, you can simply call:

# iter over all commits
for c in gh.commits.iter():
    ...

# iter over a subset
for c in gh.commits.iter(25, 100):
    ...

@svetlyak40wt
Copy link
Author

Of cause, you can't apply [:LIMIT] to an iterator, but you can do this with any iterable object. This will look like this for end user:

for c in gh.commits.iter('django/django')[:50]:
    ...

I'll try to write the implementation within few days, and will send you another pull request.

@nvie
Copy link
Contributor

nvie commented Dec 13, 2010

Yeah, what I meant is that (lazy) iterables are not subscriptable (at least not up until Python 2.6), so the above code yields a syntax error. See this:

>>> def count():                                     
...     i = 0                                        
...     while True:                                  
...         yield i                                  
...         i += 1                                   
...                                                  
>>> for i in count()[:50]:                           
...     print i                                      
...                                                  
Traceback (most recent call last):                   
  File "<stdin>", line 1, in <module>                
TypeError: 'generator' object is unsubscriptable     
>>>

However, if you wrap it in an islice wrapper, you can subscript these kinds of infinite lists:

>>> from itertools import islice
>>> for i in islice(count(), None, 7):
...     print i
...
0
1
2
3
4
5
6

An alternative is to resolve the iterator's elements, which looses the laziness (and therefore doesn't work with infinite lists).

for c in list(gh.commits.iter())[:50]:
    ....

This would effectively fetch ALL commit objects in GitHub in one big list and then slice the first 50 out of that result.

@svetlyak40wt
Copy link
Author

Generators like this one are not subscribtable, but this does not mean, that I can't return in immediate object from iter method which will support slicing and iteration. Wait for the patch, ok? :)

@nvie
Copy link
Contributor

nvie commented Dec 21, 2010

Hey Alexander, how's your patch progressing?

@svetlyak40wt
Copy link
Author

Hi. Sorry for delay. It is hard to do this in stable manner, because there is no any documentation from the GitHub and I found no way to tell GitHub the page size.

By default, page size is 35 items. I need to know that this value will never changed or (better) to specify it myself. Without this, I can't to implement a reliable slicing.

If you want, I could try to implement it and hardcode that page's size is 35 items, but it could be broken someday.

@nvie
Copy link
Contributor

nvie commented Dec 22, 2010

You can safely use the static page size, technoweenie himself promised me it won't change for the v2 API.

@svetlyak40wt
Copy link
Author

Ok, I wrote this helper, to solve this task: https://github.com/svetlyak40wt/pagerator
It is generalized and could be used anywhere else, to iterate over any paged results.

For our case, gh.commits.iter() method should return be:

def iter(self, project, branch="master", file=None):
    return pagerator.IterableQuery(
        lambda page: self.get_values(
            "list", project, branch, file,
            filter="commits", datatype=Commit,
            post_data=dict(page=page+1)
        ),  
        35 # page_size hardcoded in the GitHub
    )

But I can't test it right now, because I receive "api route not recognized" error from the GitHub, even for list method which worked previously. Do you know how to fix it?

@twidi
Copy link
Contributor

twidi commented Oct 17, 2011

@ask any news on this ? For the first time i have now pages on my project (not only on commits, but every list) and with the current code i can't access them.

And i don't know if you want the iter system to be integrated the way described here.

@twidi
Copy link
Contributor

twidi commented Oct 17, 2011

i made this : #66

@JNRowe
Copy link
Collaborator

JNRowe commented Jun 18, 2012

Given the API v2 switchoff these bugs aren't going to be worked on / fixed.

@JNRowe JNRowe closed this Jun 18, 2012
@twidi
Copy link
Contributor

twidi commented Jun 18, 2012

Yes of course :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants