Discussion:
Pull requests
(too old to reply)
David Given
2014-09-02 21:49:12 UTC
Permalink
Given the discussion in the other thread(s), I have been thinking about
pull requests. (I've also had a beer. You Have Been Warned.)

I rather like the pull request workflow from github and Bazaar, and it's
something that I rather miss from Fossil. However, given Fossil's
different philosophy, I think the workflow needs to be modified. So what
I'm proposing works like this:

We have a maintainer M and a contributor C. C does not have checkin
privileges to M's repository.

1) C clones M's repository.
2) C does some work in multiple checkins.
3) C points the Magic Pull Request tool at a commit. This spits out a
bundle containing everything that's needed to add that commit (and its
ancestors) to M's repository.
4) C sends the bundle to M.
5) M adds the bundle to their repository (or a clone of their
repository). All of C's changes end up in a private branch.
6) M examines the changes, and either rejects them or merges them into
trunk.

Key point: all of C's branch and tag information is discarded --- M
wants all of C's changes to end up in a *single* branch, so they're all
safely isolated. This means that M doesn't have to worry about
conflicting tag names. If C does branching and merging before the bundle
is created then they'll all show up as anonymous ad-hoc branches in M's
import branch.

None of this actually looks very hard. The trickiest part is step 3.
Exporting the bundle is easy --- the git exporter is only 400 loc.
Calculating what goes into the bundle is the only bit of interest, and
even that's pretty straightforward:

a) construct set of all ancestor commits from the nominated one
b) subtract all commits in M's repository
c) export commits remaining

Pulling the commit graph from a local .fossil file looks pretty
straightforward --- experimentation gives me this:

SELECT parent.uuid, child.uuid
FROM plink, blob AS parent, blob AS child
WHERE child.rid = plink.cid AND parent.rid = plink.pid;

(That seems way too easy, so I'm sure there's something I'm missing.)

However, M's repository is *remote*, so I don't have direct access to
its database. What's the easiest way to get the same information from a
remote repository? It doesn't matter if it's hacky; I'd like to put
together a proof-of-concept...
--
┌───  ───── http://www.cowlark.com ─────
│ "Blue is beautiful... blue is best...
│ I'm blue! I'm beautiful! I'm best!"
│ --- _Dougal and the Blue Cat_
Andreas Kupries
2014-09-02 22:23:33 UTC
Permalink
Let me see
Post by David Given
1) C clones M's repository.
2) C does some work in multiple checkins.
3) C points the Magic Pull Request tool at a commit. This spits out a
bundle containing everything that's needed to add that commit (and its
ancestors) to M's repository.
4) C sends the bundle to M.
5) M adds the bundle to their repository (or a clone of their
repository). All of C's changes end up in a private branch.
Alternatively, M always operates on a clone, obviating the need for
the private branch. It would just be a branch. ... Ok, it is less
secure given that M may forget to make the clone.

Maybe M should have a "Magic Pull Request Receiver Tool" as well, to
feed the bundle into; and it does all that (clone, import, ...)
Post by David Given
6) M examines the changes, and either rejects them or merges them into
trunk.
None of this actually looks very hard. The trickiest part is step 3.
Exporting the bundle is easy --- the git exporter is only 400 loc.
Calculating what goes into the bundle is the only bit of interest, and
a) construct set of all ancestor commits from the nominated one
b) subtract all commits in M's repository
c) export commits remaining
Pulling the commit graph from a local .fossil file looks pretty
SELECT parent.uuid, child.uuid
FROM plink, blob AS parent, blob AS child
WHERE child.rid = plink.cid AND parent.rid = plink.pid;
(That seems way too easy, so I'm sure there's something I'm missing.)
However, M's repository is *remote*, so I don't have direct access to
its database.
What's the easiest way to get the same information from a
remote repository? It doesn't matter if it's hacky; I'd like to put
together a proof-of-concept...
What your are proposing is actually a "sync over mail" with the
complicated part the detection of what M has and you therefore do not
have to send.

That information is part of a regular pull operation, so if we can
invoke only the steps to get that, without actually sending any
content back, then your new tool knows what the other side has.

More info on this in
http://fossil-scm.org/index.html/doc/tip/www/sync.wiki

Note that this requires digging into the fossil sources and extending
it with a command which runs the limited pull delivering the necessary
igot's from M to C, and printing the data for your tool.
--
Andreas Kupries
Senior Tcl Developer
Code to Cloud: Smarter, Safer, Faster™
F: 778.786.1133
***@activestate.com, http://www.activestate.com
Learn about Stackato for Private PaaS: http://www.activestate.com/stackato

21'st Tcl/Tk Conference: Nov 10-14, Portland, OR, USA --
http://www.tcl.tk/community/tcl2014/
Send mail to ***@googlegroups.com, by Sep 8
Registration is open.
Andy Bradford
2014-09-03 14:39:32 UTC
Permalink
That information is part of a regular pull operation, so if we can
invoke only the steps to get that, without actually sending any
content back, then your new tool knows what the other side has.
Is it as simple as taking the contents referenced in the unsent table
and putting them into a mini Fossil that has just those artifacts (and
perhaps any requisite predecessors).

Andy
--
TAI64 timestamp: 4000000054072847
Richard Hipp
2014-09-03 14:51:10 UTC
Permalink
Post by Andy Bradford
That information is part of a regular pull operation, so if we can
invoke only the steps to get that, without actually sending any
content back, then your new tool knows what the other side has.
Is it as simple as taking the contents referenced in the unsent table
and putting them into a mini Fossil that has just those artifacts (and
perhaps any requisite predecessors).
That is a heuristic that might work in many cases. But there are failure
modes.

For example, suppose the person wanting to generate the patch had actually
cloned their clone of the repo, and done pushing and pulling between his
two clones. Then the UNSENT table would have been emptied on both clones
because all artifacts had been transmitted to another repo at some point -
just not to the master repo.
--
D. Richard Hipp
***@sqlite.org
Andy Bradford
2014-09-03 15:28:16 UTC
Permalink
For example, suppose the person wanting to generate the patch had
actually cloned their clone of the repo, and done pushing and pulling
between his two clones. Then the UNSENT table would have been emptied
on both clones because all artifacts had been transmitted to another
repo at some point - just not to the master repo.
Yes, I had forgotten that unsent gets wiped no matter where the content
is synced to---that definitely won't work reliably. So it really would
be more like calculating all the igots that need to be included in the
bundle then writing those to a
Fossil.

In addition, I think we would need a new page to view in the UI (and
perhaps a new analog command line command) that would display *all*
artifacts included in the mini Fossil similar to the ambiguous artifact
page:

http://www.fossil-scm.org/index.html/info/4946

Though instead of showing ambiguous artifacts, it would just provide a
link to each one found in the bundle to assist the developer in tracking
down just which artifacts are included.

This provides a lot more generic interface but would allow contributions
to be of any content type that Fossil suppports (files, Wiki, tickets,
etc...).

Andy
--
TAI64 timestamp: 40000000540733b2
Ron W
2014-09-02 22:36:37 UTC
Permalink
Post by David Given
I rather like the pull request workflow from github and Bazaar, and it's
something that I rather miss from Fossil.
Last time I actualy used github (as opposed to simply getting the latest
sources for one thing or another), a oull was a an actual pull from an
actual git repository. A contributor would clone the project repo to a new
repo, located on githud but owned by the cloner (github called this
forking). Then (s)he would clone the clone to the local PC, work and commit
locally, then push the changes to their own github repo. Then (s)he would
go to their repo's github page and click on "Pull Request". Later, the
responsible project dev could pull from the contributor's github clone of
the project.

The same could be done with Fossil. The big difference being that while git
will pull the remote changes in to a staging branch, there by automatically
isolating the changes, Fossil does no such implicit branching. As such, the
pulling dev would necessarily have to make a fresh clone into which to pull
the contributor's changes. (IMHO, this is a good practice anyway. (And,
hopefully, the contributor made a feature branch to contain their changes.)

Your proposal for automatically calculating a list of commits to treat as
already exported is a very good idea. Would definitely make the incremental
export much easier to use.

Your proposal to automatically strip (or rename) branch/tag info is likely
to break something. If nothing else, I would expect the commit IDs to be
changed. In theory, commit ID changes could be mitigated by creating a
special tag to hold the original commit ID, but that will invite new issues
with resolving ID ambiguities. I'm not sure about what other problems that
might be created.

But the automatic export calculation is definately worth pursuing.
Andreas Kupries
2014-09-02 22:50:42 UTC
Permalink
Post by Ron W
Your proposal for automatically calculating a list of commits to treat as
already exported is a very good idea. Would definitely make the incremental
export much easier to use.
Your proposal to automatically strip (or rename) branch/tag info is likely
to break something.
According to
http://www.fossil-scm.org/index.html/doc/tip/www/fileformat.wiki#manifest
a manifest (= committed revision) can indeed contain T'ag information.

I am not sure under which circumstances such information is actually added
(vs just possible).

... Ah, seems to happen when you "commit --branch" (looked at some of
my own repos where I usually do that instead of 'branch new'.).

So, yes, stripping information entirely is not possible, only creation
of additional control artifacts which would cancel such tags.
--
Andreas Kupries
Senior Tcl Developer
Code to Cloud: Smarter, Safer, Faster™
F: 778.786.1133
***@activestate.com, http://www.activestate.com
Learn about Stackato for Private PaaS: http://www.activestate.com/stackato

21'st Tcl/Tk Conference: Nov 10-14, Portland, OR, USA --
http://www.tcl.tk/community/tcl2014/
Send mail to ***@googlegroups.com, by Sep 8
Registration is open.
Richard Hipp
2014-09-02 23:47:14 UTC
Permalink
Proposed plan of action:

(1) Modify "private" branch processing to avoid the "private" tag and
instead simply rely on the private artifacts residing in the PRIVATE table,
which should survive a "fossil rebuild". (A "fossil deconstruct; fossil
reconstruct" will make all private branches public since there is no record
of which are private in the deconstruct, but who ever does that?)

(2) Create a new "fossil bundle export" command that generates a "bundle"
from a designated branch, or all check-ins following a particular check-in,
or just a single check-in. The bundle format is an SQLite database file
(essentially the BLOB and DELTA tables, but with a few minor differences).
The bundle assumes that all artifacts that existed prior to the first
check-in of the bundle also exist on the destination and avoids resending
those artifacts, and can use those artifacts as the source for
delta-compression. The bundle includes information (in the CONFIG table
perhaps?) about the project-id and the parent check-in(s) of the first
check-in of the bundle, etc.

Aside: An SQLite database of compressed BLOBs is as small, and sometimes
smaller, than the equivalent ZIP archive. See
http://www.sqlite.org/sqlar/doc/trunk/README.md for additional
information. That webpage says that SQLAR files are about 2% larger than
ZIP. But when I SQLAR an open-office "odp" presentation (which is really a
ZIP archive) the SQLAR equivalent comes out about 0.5% smaller!

(3) Create a new "fossil bundle import" command that imports a bundle as a
*private* branch.

Do not stress over branch-name collisions. Fossil is perfectly happy
having two or more concurrent branches with same name. The human user
might get a little confused with that, but not Fossil. To help avoid
confusion among the humans, perhaps all "bundle import" check-ins and file
artifacts should have a distinct background and/or foreground color to make
their special role clear to the viewer.

(4) Create a new command and new web pages that will delete a private
branch or a portion of a private branch. I don't yet know what this
command is called. (Suggestions?) Maybe this same command will also
delete a public check-in that has not yet been synced - I recall that there
have been requests for that feature. The UNSENT table can be used to
discern whether or not a check-in has been synced.

As this operation is not undoable, there will need to be dire warnings with
an "are you sure?" prompt.

(5) Create a new command and perhaps a new web page that will publish (make
public) a private branch or check-in. I don't yet know what this command
is called. ("publish"? Other suggestions?)

This operation is not easily undoable, so there will need to be an "are you
sure?" prompt, though with a less dire warning, since the worst that can
happen is some ill-advised changes get pushed to the main repo and then
have to be shunted off into a "mistake" branch or somesuch.
--
D. Richard Hipp
***@sqlite.org
Gour
2014-09-03 08:49:25 UTC
Permalink
On Tue, 2 Sep 2014 19:47:14 -0400
Post by Richard Hipp
(2) Create a new "fossil bundle export" command that generates a
"bundle" from a designated branch, or all check-ins following a
particular check-in, or just a single check-in. The bundle format is
an SQLite database file (essentially the BLOB and DELTA tables, but
with a few minor differences). i
Wonderful!!
Post by Richard Hipp
To help avoid confusion among the humans, perhaps all "bundle import"
check-ins and file artifacts should have a distinct background and/or
foreground color to make their special role clear to the viewer.
That would be nice indeed.
Post by Richard Hipp
(4) Create a new command and new web pages that will delete a private
branch or a portion of a private branch. I don't yet know what this
command is called. (Suggestions?)
Darcs has 'obliterate'
(http://darcs.net/manual/bigpage.html#SECTION00694000000000000000)
command with the following decription: "Obliterate completely removes
recorded patches from your local repository. The changes will be undone
in your working copy and the patches will not be shown in your changes
list anymore. Beware that you can lose precious code by obliterating!",
but not being native I'm not sure whether it's relevant here
Post by Richard Hipp
(5) Create a new command and perhaps a new web page that will publish
(make public) a private branch or check-in. I don't yet know what
this command is called. ("publish"? Other suggestions?)
'publish' sounds good to me.


Sincerely,
Gour
--
Before giving up this present body, if one is able to tolerate
the urges of the material senses and check the force of desire and
anger, he is well situated and is happy in this world.
Marc Simpson
2014-09-03 11:16:51 UTC
Permalink
Post by Gour
Post by Richard Hipp
(5) Create a new command and perhaps a new web page that will publish
(make public) a private branch or check-in. I don't yet know what
this command is called. ("publish"? Other suggestions?)
'publish' sounds good to me.
Other candidates: expose, reveal.
Nico Williams
2014-09-03 17:05:13 UTC
Permalink
Post by Richard Hipp
(3) Create a new "fossil bundle import" command that imports a bundle as a
*private* branch.
Require a branch name as an argument and there will be no need to
think about branch name collisions.

It doesn't matter that the branch namespace is not unique for Fossil;
it has to be for users.

Nico
--
Andy Bradford
2014-09-03 14:41:25 UTC
Permalink
Is it as simple as taking the contents referenced in the unsent table
and putting them into a mini Fossil that has just those artifacts (and
perhaps any requisite predecessors).
Excluding any artifacts referenced in the private and shun tables, of
course.

Andy
--
TAI64 timestamp: 40000000540728b8
David Given
2014-10-18 22:28:38 UTC
Permalink
Post by David Given
Given the discussion in the other thread(s), I have been thinking about
pull requests. (I've also had a beer. You Have Been Warned.)
So the paperwork's finally come through and I'm able to work on this.
Hurray! Same disclaimer above applies.

I've put together a proof-of-concept prototype; it's in the dtrg-bundles
branch. It's a tiny tool mainly written in SQL which, when given two
repositories old.fossil and new.fossil, and an artifact in new.fossil,
will spit out a git fast-export file which will reproduce that artifact
in old.fossil. It works by calculating an rid exclusion list which is
then thrown at fossil export --import-marks.

Use it like this:

$ test/make-bundle-data old.fossil new.fossil
(this will create some potted test data)

$ tools/exportbundle.sh old.fossil new.fossil ab45b4cd > bundle
(this will generate the bundle; you'll need to look at new.fossil and
pick an artifact ID. Tagnames are not supported)

$ tools/importbundle old.fossil test.fossil bundle
(this will clone old.fossil and apply the bundle to it)

Now, this doesn't quite work yet. If you try this and look at
test.fossil, the new grafted tree doesn't link up with the old tree.
(And boy, does the timeline graph drawer get confused by this
sometimes...) Looking at the git export files, it seems there is no
information there to allow the importer to figure out where to graft the
tree on. For example, here's the bundle produced by producing a bundle
for 'New file' in the test data:

---snip---
blob
mark :14
data 36
empty
data
more data
even more data

blob
mark :20
data 9
new file

commit refs/heads/trunk
mark :17
committer dg <dg> 1413670593 +0000
data 26
Clone, add: even more data
from :13
M 100644 :14 1

commit refs/heads/trunk
mark :23
committer dg <dg> 1413670594 +0000
data 8
New file
from :17
M 100644 :20 2

tag branchpoint
from :17
tagger <tagger> 1413670593 +0000
data 0
---snip---

(Also that last tag artifact shouldn't be there; pretend you haven't
seen it.)

You can see that the commit at :17 is supposed to inherit from :13, but
of course :13 isn't in there. I've tried tweaking the search algorithm
to emit :13 as well, in the hope that the artifact UUIDs will match up,
but it doesn't work --- obviously the generated C card for :13 doesn't
precisely match the existing card.

This would seem to be an issue with incremental imports as a whole; what
am I missing here?

Please note that this is a *proof-of-concept*, and is never intended to
be actually merged. The SQL needs simplifying and it's probably full of
bugs. (I need to double-check that I'm not mixing rids from the old and
new repositories, for a start.)
--
┌───  ───── http://www.cowlark.com ─────
│
│ "Home is where, when you have to go there, they have to take you in."
│ --- Cordelia Naismith (via Robert Frost)
Continue reading on narkive:
Loading...