CVS mixed-tagging for massive Open Source Project Management

Posted 21 Feb 2001 at 16:35 UTC by lkcl Share This

i emailed [worldforge] a while back about using cvs in a dual-checkout mode.

i needed to do a small development branch on a project i am doing, so i tried it out. it works! here's the [accurate] description of the necessary cvs commands.

this article is of interest to people who use cvs, are working on a massive open source project - thousands or millions of lines, with several inter-dependent components, several developers and tens of man-years of development effort.

you think this isn't a problem? that's because your project is too small. wait a few years, you'll find out what i'm talking about.

goals and issues

the aim is to allow simultaneous development of two sets of code modifications, possibly a long-term issue (i.e. more than a few days: anything larger than that in a large project is insane and asking for trouble), whilst still allowing both developments to benefit from less merging-hassle, no interference from the branching, and the branch still benefits from the non-branched code-mods.

why use this method?

well, let's say you're developing a new version of a library. or you want to change an API.

you change it in the main cvs, it's going to break everything. if you don't do it - NOW - more people are going to use the out-of-date API, whilst you're developing the new one. the new one is going to take 3 months to develop. the projects using your API CANNOT wait 3 months just for your new version, ESPECIALLY as there are 10 other APIs, all with the same problem!

so you cvs tag all files that use the API. you cvs tag your entire library. you modify your library. you modify all files that use the library. you finish the job, you merge.

and while you were doing all that, did anyone notice? did they xxxx :) did anyone complain that the cvs main was broken? did they xxxx :) oh, you got one person who was away on holiday, he used your old API, you didn't do a new tag for him because he hadn't WRITTEN the code yet when you did the mixed-tag, so you bitched at him and you got it all sorted out between you, like sensible developers should.

now take that to the next level (e.g. worldforge). you have, what ten libraries? ten sets of developers? has anyone actually _ever_ successfully got the entire worldforge codebase to compile??? :) :) let alone run!!!

why use this method?

samba TNG architecture is a multi-developer, multi-program development project. the dividing lines are the SMB IPC$ share and the DCE/RPC pipes / services. each part has a client-side API, server-side API, client-side _usage_ of the client-side API, server-side _usage_ of the server-side API!

that's a lot of projects, and a lot of hell for simultaneous development.

the cvs mixed-tag method described here is a _perfect_ solution for doing development of an entire new method, whilst allowing that development group to "benefit" from the "latest" cvs main developments, whilst also being able to "offer" stability in their "cover" version in cvs main.

the requirements are:

  • cvs (*duur*!)

  • all developers must do checkouts of both sets of code. once a commit is performed (after testing!), it is the responsibility of all developers to checkout the *other* set of code and do testing on that.

why? because otherwise, the other set of developers are going to get really pissed off with you for breaking things in the are they depending on you *not* to change, because they have responsibility for a few files - just a few (or maybe more!) and you broke it by making changes in *their* area of responsibility.

if you want to do that [break their compile], you must a) ask them if it's ok b) don't do it, get them to do it: they're the ones dealing with those files, not you! c) do yet another small-file-branch, which will not impact on them but you can do "your thing" in that series of files. d) BEFORE the mixed-tagging, you split the shared files up into as small modules as possible so that the impact on each other's areas of responsibility is as small as possible. this is good multi-developer API-based programming practice _anyway_.

instructions:

  • 1)

    cvs tag tagname file1.c file2.h file3.py dir1/file4.ext1 etc.

  • 2)

    create new directory somewhere. cd to it.

  • 3)

    cvs co -r tagname toplevelcvsmodulename

    this will result *only* in the following files being checked out:

    file1.c file2.h file3.py dir1/file4.ext1

    *DON'T WORRY ABOUT THIS YET!!! :) :) *

  • 4)

    cd toplevelcvsmodulename

  • 5)

    cvs update -d -P -r ''

    OR - and this is an important or - if you are *already* working in a cvs tag (i.e. if at stage 1, above, you were working in *another* cvs branch, where you performed, originally, cvs co -r origtagname toplevelcvsmodulename) you MUST do:

    cvs update -d -P -r 'origtagname'

voila!!! that's it!!!!

if you are brave, examine the CVS/Entries file. you will notice that it contains mixed entries: of two different tag names on individual files.

cvs commits in this "mixed" directory will result in the files tagged with 'tagname' going into the tagname branch and those with origtagname (or no tag) going into origtagname. and the files with the same filename in origtagname will NOT be touched.

hence the really important advice about both (or more!) sets of developers doing checkouts of all sets and compilation / testing of all sets. or good communication / good development practices.

cvs update works exactly as expected in the "mixed" directory. you do NOT need to specify the '-r' option again. just cvs update.

NOTE: BIG NOTE!!!! DON'T USE cvs update -A! the -r option on cvs update created a "sticky" tag on the missing files! you do cvs update -A and it IGNORES "sticky" tags. you will SCRREW UP your carefully-prepared mixed-tag environment with update -A!

i'm sure there is a use for -A - e.g. when you want to "merge" code from the mixed-tagged branch and "shut down" the temporary tag.

CAVEATS:

stage 5 froze in the directory in which the "mixed" tag files were found. it said, "cvs update: duplicate key found for 'y'".

uhh??? what the hell is 'y'? i never touched the 'y' key!!!!

i examined the CVS/Entries file, it contained all the entries. it was just that the files were not there.

so i performed a cvs update (no -r option the second time)

and it worked. all the files were there, all checked out. i am very impressed and pleased

so, other than this slightly weird behaviour and the necessary work-around that required a stage 6 - a second cvs update - that's it.

conclusion:

this mixed-tag method is a really useful approach to the problems associated with doing thousand and million line project development with cvs, that would typically require a high (and restrictive!) degree of synchronisation and communication with *internal and informal* release cycles to solve!

and we know that high communication and synchronisation means no, we get the normal amount of comms and sync which results in said massive projects becoming completely unmanageable and impossible to work in, very quickly: noone can ever compile anything completely.

use cvs in mixed-tag mode, problem is solved.

fun and games with cvs.

who wants to do the bug-report about the duplicate key found thing? :)

----- Luke Kenneth Casson Leighton -----

"i want a world of dreams, run by near-sighted visionaries"
"good.  that's them sorted out.  now, on _this_ world..."

How about patch?, posted 21 Feb 2001 at 17:37 UTC by sej » (Master)

Your article seems an admirable job of wrestling with the branch/merge and tagging mechanisms of cvs. Good luck getting a larger group of programmers to understand and execute this stratagem. I'm sure a lot of respondents will say "wait for subversion, it will solve all this."

But I would like to point out that you can accomplish the same thing by setting up separate repositories (cvs, rcs, or other) and using patch to keep them selectively in synch. One repository for the stable source tree, one for the experimental development. Commits on the experimental repository are sorted into those to be transferred now, and those to be transferred later.

Admittedly this is not easy with cvs as it now stands, because generating patches is complicated by the fact revision tags are per file rather than per commit. You need to establish some extra mechanism (ivmkcm is an example) or switch to subversion. But once you can generate and keep track of patch files you can carry on distributed multi-track development without reliance on a single repository, or fear you'll never be able to synch up when the time is right.

patch. dirdiff. emerge, posted 21 Feb 2001 at 18:15 UTC by lkcl » (Master)

hiya,

i did an article on dirdiff which i found an extremely useful tool to do merging. regardless of the merge tool, if you have multiple repositories (effectively the same as multiple complete branch tags) it is GUARANTEED that they will get out-of-sync over long periods of time (more than a few days), in areas that are too much time for each development group to bother with doing things like patch, and merge.

cvs branching is like, absolutely fine for things like, oh, i just want to try out these 300 lines of code or do a global/search/replace on the entire code, without breaking anything for anyone. a very specific, MAXIMUM of TWO WEEKS, development task.

for a major, major impact task such as, "today i am going to start version 1.9 of the libmelon library" or "today i am going to add DCE/RPC over TCP/IP to the TNG DCE/RPC libraries", or, "today i am going to start to do a Primary Domain Controller", a cvs branch is just not xxxxing funny any more.

i sincerely hope that subversion can help with this type of task. i will love it very much if it does!

much appreciate your reminder that other good alternatives are out there.

luke

mixed-tagging vs patch, posted 21 Feb 2001 at 18:26 UTC by lkcl » (Master)

sei, i thought about what you suggest a little more. the primary difference is that mixed-tagging is effectively an automated version of doing manual patch, except that the mixed-tagged files are just completely skipped out from the patch.

which requires, for shared files, some level of manual patching, developer cooperation, blah blah.

i realise there are lots of steps involved, which is why i wrote this article, outlining them. surely there must be more massive open source projects suffering from this kind of problem, neh? the linux kernel, for a start! what about gimp? etc.

my hope is that people will go, hm, sounds difficult, but we know branches getting out-of-date is even _more_ difficult, let's give it a shot, or go help with subversion or _something_, 'cos what's happening at the moment is intolerable.

open source rulz :)

perforce, posted 21 Feb 2001 at 18:31 UTC by ask » (Master)

perforce is most excellent at this branching thing. :-)

you are making a good point, posted 21 Feb 2001 at 18:39 UTC by sej » (Master)

You are making a good point, that there is much more required of configuration management than a utility optimized for a single-threaded revision history can easily handle.

If I was in the role of mentoring someone new to programming, I would tell them to get comfortable with the idea of raw diffs before learning any other configuration management tool. Because that is the knowledge you need to handle all the corner cases that inevitably come up. Use nothing but patch and diff for a year, then move on to tools that automate the process. Not doing this is like programming in C without understanding the Von Neumann architecture. But that is another topic for another day :-).

re: perforce, posted 21 Feb 2001 at 19:00 UTC by thom » (Master)

ASK said: perforce is excellent at this branching stuff :-)

It isn't OSS, is it? Subversion looks excellent, but as someone else pointed out to me, they're still using CVS for their version control system. Now, forgive me if there's a good reason, but that doesn't sound promising.

correction, posted 21 Feb 2001 at 19:19 UTC by lkcl » (Master)

the cvs tag of the individual files must have the -b option:

cvs tag -b newtagname file1 file2 dir1/file3

if you wish to be able to commit. but i'm sure you knew that already. and if i had kept my mouth shut, you would have assumed that i knew it too :)

but for those people who _don't_ know, if you don't do the -b option, then when you come to do a cvs commit, it will bitch at you saying "muur, there is a sticky bit set on this file: bugger off".

[i always wondered what sticky meant... :)]

Subversion, posted 21 Feb 2001 at 22:58 UTC by AlanShutko » (Journeyer)

Yes, they're still using CVS. There _is_ a good reason for it... subversion isn't done yet. It's not even remotely usable yet. There's a brief status on their website, but basically they're working on the repository these days.

It's going to rock when it comes out, but it's not ready to host a project on.

Subversion is using CVS (still), posted 22 Feb 2001 at 08:32 UTC by bagder » (Master)

thom wrote:
    Subversion looks excellent, but as someone else pointed out to me, they're still using CVS for their version control system. Now, forgive me if there's a good reason, but that doesn't sound promising.

I think it is pretty narrow-minded to even think in these terms. You seen any Subversion release-archives? You seen any big and flashy announcements about the greatness of the current Subversion?

No, you haven't. Subversion is still very much under development, and believe me, as soon as it is ready for it, a flood of people are gonna trade in their CVS repositories for Subversion ones. There's just no point in doing that before the basic functionality is there and works.

more advice on using cvs well, posted 22 Feb 2001 at 08:49 UTC by lkcl » (Master)

received this email:

...
...

BTW, the info-cvs@gnu.org mailing list is a good place for CVS questions, and by lurking on it you can learn a lot about CVS.

The "duplicate key y" thing you saw is due to a messed up "val-tags" file in the CVS repository.

It's a simple text file which contains a list of tags that CVS thinks are existing in the repository. (I'm not exactly sure what CVS uses val-tags for, but I do know it's not critical, all tags do not show up there, necessarily.)

You can look through the file, there will probably be a bunch of tag names each like

tag1 y tag2 y tag3 y

Then, probably one line like

y y

or

y

Delete the bogus looking line and that "duplicate key y" message should go away.

It's coming from CVS's reimplementation of the ndbm library stuff.

Also, I saw a reply which recommended using "cvs tag -b ..." to create a branch tag.

You can do that, but, if you want to ever refer to the origin of the branch you'd better do

cvs rtag branch_origin modulename cvs rtag -b -r branch_origin branch_tag modulename

Or else use my patch to CVS located here:

http://www.geocities.com/dotslashstar/branch_patch.html

(I'm trying to get that patch into the real CVS, no luck so far... good or bad..)

Way more info than you wanted, I'm sure...

thanks, steve. this is lots and lots of info that _i_ will find useful, and i think other people will too. so dumped it here.

luke

Subversion, posted 22 Feb 2001 at 16:05 UTC by jimb » (Master)

It's probably worth saying that the initial releases of Subversion will pretty much provide the same behaviors CVS does now. I don't think 1.0 will make the technique described in the article much easier.

But I do think Subversion will have two major advantages:

  • We have a much better repository structure. It shares information between related revisions in the right way, and is extensible, so anyone can add new kinds of metadata as they invent them.

  • We have a simple interface to that repository structure. My personal dream is to see Perl / Python / Guile bindings for the interface in svn_fs.h, so people can write CGI scripts that do interesting things. Or, they can write scripts that talk DAV to the server and do stuff that way.

Taken together, that means that Subversion should be easy to adapt to whatever techniques you come up with. Subversion should a better base for experimentation than CVS. I think that's an important quality in Free software projects.

subversion and other revision tools, posted 22 Feb 2001 at 17:40 UTC by elrond » (Master)

subversion seems to have great goals, I'm just wondering, why they didn't try to help PRCS to become, what they are looking for.

PRCS already has a lot of the things, that subversion seems to want: per-commit revision numbers, storing of symlinks,directories,etc., renames. There's one major feature missing, and Josh is working on that for PRCS2: networked repos access.

Having per-commit revision numbers would realy improve merging: You know, you last merged in stuff from version main.544 and main is currently at main.555, so you do prcs diff -rmain.544 -rmain.555 > patch, analyse patch and apply it.

Whatever happened to Bitkeeper?, posted 22 Feb 2001 at 21:35 UTC by Bram » (Master)

At some point, there were plans that Bitkeeper was going to become the version control system used for the Linux codebase, but I heard for some reason Torvalds didn't want that, so it won't be. I'd guess it has something to do with the license. Does anyone have any more definitive information?

aegis is nice too, posted 24 Feb 2001 at 04:18 UTC by graydon » (Master)

I've been using aegis (GPL) at home for a while now, and am very pleased with it. it's more of a "strong configuration management" tool rather than mere version control, which imho makes it ideal for delicate work. it actively manages changesets through creation, development, building, testing, review and integrating, and rejects anything which would cause your repo to break, both at build time or during regression tests (which it supervises). this shortens bug-discovery time dramatically: when I make a change, I type aeb && aet, wait a minute, and get direct feedback about whether the change breaks things or is safe. the first couple weeks I used it, I noticed that my morale was way up: I no longer had that feeling of dread which accompanies a cvs commit, since every time I managed to get a change in, I knew that the change didn't harm the existing system.

downsides: the "network interface" takes some getting used to. rather than using a command protocol against a remote server, you work on your local machine and package your changesets into base64'ed bundles which others can pick up and integrate. this is really non-ideal; he claims "it's more flexible" (which I suppose it is) but you really also want to have a mode wherein you can talk to a remote server live, or at least auto-sync your disconnected version with it every time you make a change.

furthermore this message indicates that aegis will not scale to truly gargantuan projects, but I've had no troubles with it the most recent couple-hundred-file program I've used it on.

Update, posted 27 Feb 2001 at 09:04 UTC by lkcl » (Master)

well. it couldn't be easier.

the "freezing" i found earlier when doing a [mixed-tag] cvs checkout and update was due to a) a bug in cvs, as steve pointed out b) i had not specified -b (concurrent branch option) on the original cvs tag of the [few] files.

doing a merge? utterly simple. cvs update -j tagname [in the main cvs working area, not the tagname'd working area]. you can then see what it did for you by doing a cvs diff. decide if you like it, cvs commit it.

i spent longer looking through cvs --help-commands for the update -j option than actually doing the merge and writing this.

PRCS, posted 27 Feb 2001 at 21:25 UTC by nmw » (Journeyer)

As elrond mentioned, PRCS is really cool. It does all this commit-level versioning, and much more, including file renaming, deletion, undeletion (no Attic!) and keeps track not just of the version history of a project, but the merge history as well.

I use PRCS every day and I find it an order of magnitude easier to use than CVS; there's none of this tag madness to deal with nor can you make a mistake and commit without a tag and merging is as easy as lkcl finds it is when being very careful with tags in CVS. But PRCS does lack some commands forcing one to edit the project control file for some tasks (such as renaming files).

PRCS also lacks a client/server mode, but there is a script included with PRCS called prcs-synch that can synchronize PRCS repositories across filesystems (e.g., NFS) and/or SSH (you choose). I wonder how well (or not) prcs-synch would work with a large distributed development project...

PRCS version 2 will use XDelta/XDFS instead of RCS and will have a client/server mode. XDelta is a really neat binary (as opposed to text-based) diff/patch utility that is based on the rsync algorythm. PRCS v2 will have applications beyond source version control; it will be useable for laptop/server synchronization, software distribution, etc...

I wonder how far along PRCS v2 is and what can be done to help it along...

nmw

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page