CVS mixed-tagging for massive Open Source Project Management
Posted 21 Feb 2001 at 16:35 UTC by lkcl 
i emailed [worldforge] a while back about using cvs in a
dual-checkout mode.
i needed to do a small development branch on a project i am doing, so i
tried it out. it works! here's the [accurate] description of the
necessary cvs commands.
this article is of interest to people who use cvs, are working on a
massive open source project - thousands or millions of lines, with
several inter-dependent components, several developers and tens of
man-years of development effort.
you think this isn't a problem? that's because your project is too
small. wait a few years, you'll find out what i'm talking about.
goals and issues
the aim is to allow simultaneous development of two sets of code
modifications, possibly a long-term issue (i.e. more than a few days:
anything larger than that in a large project is insane and asking for
trouble), whilst still allowing both developments to benefit from less
merging-hassle, no interference from the branching, and the branch still
benefits from the non-branched code-mods.
why use this method?
well, let's say you're developing a new version of a library. or you
want
to change an API.
you change it in the main cvs, it's going to break everything. if you
don't do it - NOW - more people are going to use the out-of-date API,
whilst you're developing the new one. the new one is going to take 3
months to develop. the projects using your API CANNOT wait 3 months
just for your new version, ESPECIALLY as there are 10 other APIs, all
with the same problem!
so you cvs tag all files that use the API. you cvs tag your entire
library. you modify your library. you modify all files that use the
library. you finish the job, you merge.
and while you were doing all that, did anyone notice? did they xxxx :)
did anyone complain that the cvs main was broken? did they xxxx :) oh,
you got one person who was away on holiday, he used your old API, you
didn't do a new tag for him because he hadn't WRITTEN the code yet when
you did the mixed-tag, so you bitched at him and you got it all sorted
out between you, like sensible developers should.
now take that to the next level (e.g. worldforge). you have, what ten
libraries? ten sets of developers? has anyone actually _ever_
successfully got the entire worldforge codebase to compile??? :) :) let
alone run!!!
why use this method?
samba TNG architecture is a multi-developer, multi-program development
project. the dividing lines are the SMB IPC$ share and the DCE/RPC
pipes
/ services. each part has a client-side API, server-side API,
client-side
_usage_ of the client-side API, server-side _usage_ of the server-side
API!
that's a lot of projects, and a lot of hell for simultaneous
development.
the cvs mixed-tag method described here is a _perfect_ solution for
doing
development of an entire new method, whilst allowing that development
group to "benefit" from the "latest" cvs main developments, whilst also
being able to "offer" stability in their "cover" version in cvs main.
the requirements are:
- cvs (*duur*!)
- all developers must do checkouts of both sets of code. once a
commit
is
performed (after testing!), it is the responsibility of all developers
to
checkout the *other* set of code and do testing on that.
why? because otherwise, the other set of developers are going to get
really pissed off with you for breaking things in the are they depending
on you *not* to change, because they have responsibility for a few files
-
just a few (or maybe more!) and you broke it by making changes in
*their*
area of responsibility.
if you want to do that [break their compile], you must a) ask them if
it's
ok b) don't do it, get them to do it: they're the ones dealing with
those
files, not you! c) do yet another small-file-branch, which will not
impact on them but you can do "your thing" in that series of files. d)
BEFORE the mixed-tagging, you split the shared files up into as small
modules as possible so that the impact on each other's areas of
responsibility is as small as possible. this is good multi-developer
API-based programming practice _anyway_.
instructions:
- 1)
cvs tag tagname file1.c file2.h file3.py dir1/file4.ext1
etc.
- 2)
create new directory somewhere. cd to it.
- 3)
cvs co -r tagname toplevelcvsmodulename
this will result *only* in the following files being checked out:
file1.c file2.h file3.py dir1/file4.ext1
*DON'T WORRY ABOUT THIS YET!!! :) :) *
- 4)
cd toplevelcvsmodulename
- 5)
cvs update -d -P -r ''
OR - and this is an important or - if you are *already* working in a cvs
tag (i.e. if at stage 1, above, you were working in *another* cvs
branch,
where you performed, originally, cvs co -r origtagname
toplevelcvsmodulename) you MUST do:
cvs update -d -P -r 'origtagname'
voila!!! that's it!!!!
if you are brave, examine the CVS/Entries file. you will notice that it
contains mixed entries: of two different tag names on individual
files.
cvs commits in this "mixed" directory will result in the files tagged
with
'tagname' going into the tagname branch and those with origtagname (or
no
tag) going into origtagname. and the files with the same filename in
origtagname will NOT be touched.
hence the really important advice about both (or more!) sets of
developers
doing checkouts of all sets and compilation / testing of all sets. or
good communication / good development practices.
cvs update works exactly as expected in the "mixed" directory. you do
NOT
need to specify the '-r' option again. just cvs update.
NOTE: BIG NOTE!!!! DON'T USE cvs update -A! the -r option on cvs update
created a "sticky" tag on the missing files! you do cvs update -A and
it
IGNORES "sticky" tags. you will SCRREW UP your carefully-prepared
mixed-tag environment with update -A!
i'm sure there is a use for -A - e.g. when you want to "merge" code from
the mixed-tagged branch and "shut down" the temporary tag.
CAVEATS:
stage 5 froze in the directory in which the "mixed" tag files were
found.
it said, "cvs update: duplicate key found for 'y'".
uhh??? what the hell is 'y'? i never touched the 'y' key!!!!
i examined the CVS/Entries file, it contained all the entries. it was
just that the files were not there.
so i performed a cvs update (no -r option the second time)
and it worked. all the files were there, all checked out. i am very
impressed and pleased
so, other than this slightly weird behaviour and the necessary
work-around that required a stage 6 - a second cvs update - that's
it.
conclusion:
this mixed-tag method is a really useful approach to the problems
associated with doing thousand and million line project development with
cvs, that would typically require a high (and restrictive!) degree of
synchronisation and communication with *internal and informal* release
cycles to solve!
and we know that high communication and synchronisation means no, we get
the normal amount of comms and sync which results in said massive
projects
becoming completely unmanageable and impossible to work in, very
quickly:
noone can ever compile anything completely.
use cvs in mixed-tag mode, problem is solved.
fun and games with cvs.
who wants to do the bug-report about the duplicate key found thing?
:)
----- Luke Kenneth Casson Leighton -----
"i want a world of dreams, run by near-sighted visionaries"
"good. that's them sorted out. now, on _this_ world..."
How about patch?, posted 21 Feb 2001 at 17:37 UTC by sej »
(Master)
Your article seems an admirable job of wrestling with the branch/merge
and tagging mechanisms of cvs. Good luck getting a larger group of
programmers to understand and execute this stratagem. I'm sure a lot
of respondents will say "wait for subversion, it will
solve all this."
But I would like to point out that you can accomplish the same thing by
setting up separate repositories (cvs, rcs, or other) and using patch
to keep them selectively in synch. One repository for the stable
source tree, one for the experimental development. Commits on the
experimental repository are sorted into those to be transferred now,
and those to be transferred later.
Admittedly this is not easy with cvs as it now stands, because
generating patches is complicated by the fact revision tags are per
file rather than per commit. You need to establish some extra
mechanism (ivmkcm is an
example) or switch to subversion. But
once you can generate and keep track of patch files you can carry on
distributed multi-track development without reliance on a single
repository, or fear you'll never be able to synch up when the time is
right.
hiya,
i did an article on dirdiff which i found
an extremely useful tool to do merging. regardless of the merge tool,
if you have multiple repositories (effectively the same as multiple
complete branch tags) it is GUARANTEED that they will get
out-of-sync over long periods of time (more than a few days), in areas
that are too much time for each development group
to bother with doing things like patch, and merge.
cvs branching is like, absolutely fine for things like, oh, i just want
to try out these 300 lines of code or do a global/search/replace on the
entire code, without breaking anything for anyone. a very specific,
MAXIMUM of TWO WEEKS, development task.
for a major, major impact task such as, "today i am going to start
version 1.9 of the libmelon library" or "today i am going to add DCE/RPC
over TCP/IP to the TNG DCE/RPC libraries", or, "today i am going to
start to do a Primary Domain Controller", a cvs branch is just not
xxxxing funny any more.
i sincerely hope that subversion can help with this type of task. i
will love it very much if it does!
much appreciate your reminder that other good alternatives are out
there.
luke
sei, i thought about what you suggest a little more. the primary
difference is that mixed-tagging is effectively an automated version of
doing manual patch, except that the mixed-tagged files are just
completely skipped out from the patch.
which requires, for shared files, some level of manual patching,
developer cooperation, blah blah.
i realise there are lots of steps involved, which is why i wrote this
article, outlining them. surely there must be more massive open source
projects suffering from this kind of problem, neh? the linux kernel,
for a start! what about gimp? etc.
my hope is that people will go, hm, sounds difficult, but we know
branches getting out-of-date is even _more_ difficult, let's give it a
shot, or go help with subversion or _something_, 'cos what's happening
at the moment is intolerable.
open source rulz :)
perforce, posted 21 Feb 2001 at 18:31 UTC by ask »
(Master)
perforce is most excellent at this
branching thing. :-)
You are making a good point, that there is much more required of
configuration management than a utility optimized for a single-threaded
revision history can easily handle.
If I was in the role of mentoring someone new to programming, I would
tell them to get comfortable with the idea of raw diffs before learning
any other configuration management tool. Because that is the knowledge
you need to handle all the corner cases that inevitably come up. Use
nothing but patch and diff for a year, then move on to tools that
automate the process. Not doing this is like programming in C without
understanding the Von Neumann architecture. But that is another topic
for another day :-).
re: perforce, posted 21 Feb 2001 at 19:00 UTC by thom »
(Master)
ASK said:
perforce is excellent at this branching stuff :-)
It isn't OSS, is it? Subversion looks excellent, but as someone else
pointed out to me, they're still using CVS for their version
control system. Now, forgive me if there's a good reason, but that
doesn't sound promising.
correction, posted 21 Feb 2001 at 19:19 UTC by lkcl »
(Master)
the cvs tag of the individual files must have the -b option:
cvs tag -b newtagname file1 file2 dir1/file3
if you wish to be able to commit. but i'm sure you knew that already.
and if i had kept my mouth shut, you would have assumed that i knew it
too :)
but for those people who _don't_ know, if you don't do the -b option,
then when you come to do a cvs commit, it will bitch at you saying
"muur, there is a sticky bit set on this file: bugger off".
[i always wondered what sticky meant... :)]
Subversion, posted 21 Feb 2001 at 22:58 UTC by AlanShutko »
(Journeyer)
Yes, they're still using CVS. There _is_ a good reason for it...
subversion isn't done yet. It's not even remotely usable yet. There's
a brief status on their website, but basically they're working on the
repository these days.
It's going to rock when it comes out, but it's not ready to host a
project on.
thom wrote:
Subversion looks excellent, but as
someone else pointed out to me, they're still using CVS for their
version control system. Now, forgive me if there's a good reason, but
that doesn't sound promising.
I think it is pretty narrow-minded to even think in these terms. You
seen any Subversion release-archives? You seen any big and flashy
announcements about the greatness of the current Subversion?
No, you haven't. Subversion is still very much under development, and
believe me, as soon as it is ready for it, a flood of people are gonna
trade in
their CVS repositories for Subversion ones. There's just no point in
doing that before the basic functionality is there and works.
received this email:
...
...
BTW, the info-cvs@gnu.org mailing list is
a good place for CVS questions, and by
lurking on it you can learn a lot about CVS.
The "duplicate key y" thing you saw
is due to a messed up "val-tags" file in the
CVS repository.
It's a simple text file which contains
a list of tags that CVS thinks are existing in
the repository. (I'm not exactly sure what CVS uses
val-tags for, but I do know it's not critical, all tags do
not show up there, necessarily.)
You can look through the file, there will probably
be a bunch of tag names each like
tag1 y
tag2 y
tag3 y
Then, probably one line like
y y
or
y
Delete the bogus looking line
and that "duplicate key y" message should
go away.
It's coming from CVS's reimplementation of the ndbm library
stuff.
Also, I saw a reply which recommended using
"cvs tag -b ..." to create a branch tag.
You can do that, but, if you want to ever refer to
the origin of the branch you'd better do
cvs rtag branch_origin modulename
cvs rtag -b -r branch_origin branch_tag modulename
Or else use my patch to CVS
located here:
http://www.geocities.com/dotslashstar/branch_patch.html
(I'm trying to get that patch into the real CVS, no luck so far...
good or bad..)
Way more info than you wanted, I'm sure...
thanks, steve. this is lots and lots of info that _i_ will find useful,
and i think other people will too. so dumped it here.
luke
Subversion, posted 22 Feb 2001 at 16:05 UTC by jimb »
(Master)
It's probably worth saying that the initial releases of Subversion
will pretty much provide the same behaviors CVS does now. I don't
think 1.0 will make the technique described in the article much easier.
But I do think Subversion will have two major advantages:
- We have a much better repository structure. It shares
information between related revisions in the right way, and is
extensible, so anyone can add new kinds of metadata as they invent
them.
- We have a simple interface to that repository structure. My
personal dream is to see Perl / Python / Guile bindings for the
interface in
svn_fs.h, so people can write CGI scripts that do interesting
things. Or, they can write scripts that talk DAV to the server and do
stuff that way.
Taken together, that means that Subversion should be easy to adapt
to whatever techniques you come up with. Subversion should a better
base for experimentation than CVS. I think that's an important
quality in Free software projects.
subversion seems to have great goals, I'm just wondering, why they
didn't try to help PRCS to become,
what they are looking for.
PRCS already has a lot of the things, that subversion seems to want:
per-commit revision numbers, storing of symlinks,directories,etc.,
renames. There's one major feature missing, and Josh is working on that
for PRCS2: networked repos access.
Having per-commit revision numbers would realy improve merging: You
know, you last merged in stuff from version main.544 and main is
currently at main.555, so you do prcs diff -rmain.544 -rmain.555 >
patch, analyse patch and apply it.
At some point, there were plans that Bitkeeper was going to become the
version control system used for the Linux codebase, but I heard for some
reason Torvalds didn't want that, so it won't be. I'd guess it has
something to do with the
license. Does anyone have any more definitive information?
I've been using aegis (GPL)
at
home for a while now, and am
very pleased with it. it's more of a "strong configuration management"
tool rather than mere version control, which imho makes it ideal for
delicate work. it actively
manages changesets through creation, development, building, testing,
review and integrating, and rejects anything which would cause your repo
to break, both at build time or during regression tests (which it
supervises). this shortens
bug-discovery time dramatically: when I make a change, I type
aeb && aet, wait a minute, and get direct feedback
about whether the change breaks things or is safe. the first couple
weeks I used it, I noticed that my morale was way up: I no longer had
that feeling of dread which accompanies a cvs
commit, since every time I managed to get a change in, I
knew that the change didn't harm the existing system.
downsides: the "network interface" takes some getting used to. rather
than using
a command protocol against a remote server, you work on your local
machine and package your changesets into base64'ed bundles which others
can pick up and integrate. this is really non-ideal; he claims "it's
more flexible" (which I suppose it is) but you really also want to have
a mode wherein you can talk to a remote server live, or at
least auto-sync your disconnected version with it every time you make a
change.
furthermore this
message indicates that aegis will not
scale to truly gargantuan projects, but I've had no troubles with it
the most recent couple-hundred-file program I've used it on.
Update, posted 27 Feb 2001 at 09:04 UTC by lkcl »
(Master)
well. it couldn't be easier.
the "freezing" i found earlier when doing a [mixed-tag] cvs checkout and
update was due to a) a bug in cvs, as steve pointed out b) i had not
specified -b (concurrent branch option) on the original cvs tag of the
[few] files.
doing a merge? utterly simple. cvs update -j tagname [in the main cvs
working area, not the tagname'd working area]. you can then see what it
did for you by doing a cvs diff. decide if you like it, cvs commit it.
i spent longer looking through cvs --help-commands for the update -j
option than actually doing the merge and writing this.
PRCS, posted 27 Feb 2001 at 21:25 UTC by nmw »
(Journeyer)
As elrond mentioned,
PRCS is really
cool. It does all this commit-level versioning, and much more, including
file renaming, deletion, undeletion (no Attic!) and keeps track not just
of the version history of a project, but the merge history as well.
I use PRCS every day and I find it an order of magnitude easier to use
than CVS; there's none of this tag madness to deal with nor can you make
a mistake and commit without a tag and merging is as easy as
lkcl finds it is when being very careful with tags in
CVS.
But PRCS does lack some commands forcing one to edit the project control
file for some tasks (such as
renaming files).
PRCS also lacks a client/server mode, but there is a script included
with PRCS called prcs-synch that can synchronize PRCS repositories
across filesystems (e.g., NFS) and/or SSH (you choose). I wonder how
well (or not) prcs-synch would work with a large distributed
development project...
PRCS version 2 will use XDelta/XDFS instead of RCS and will have a
client/server mode. XDelta is a really neat binary (as opposed to
text-based) diff/patch utility that is based on the rsync algorythm.
PRCS v2 will have applications beyond source version control; it will be
useable for laptop/server synchronization, software distribution, etc...
I wonder how far along PRCS v2 is and what can be done to help it
along...
nmw