aboutsummaryrefslogtreecommitdiff
path: root/README
blob: aaa44f93f69bbb7e91fea40af8b0e3d7de97c2be (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
Prerequisite packages to install
================================

- dev-vcs/cvs
- dev-vcs/cvs-fast-export
- dev-vcs/git
- dev-libs/libxslt (for userinfo.xml conversion)


Create the author map
=====================

Extract userinfo.xml from LDAP on dev.gentoo.org::

  $ perl_ldap -U

Create authormap.txt from userinfo.xml::

  $ ./make-authormap.sh >authormap.txt


Fetch and unpack the CVS repository
===================================

Fetch a copy of the archived gentoo-x86 CVS repository from:
https://projects.gentoo.org/vcs-history/gentoo-x86.tar.gz


Run cvs-fast-export
===================
::

  $ cd var/cvsroot/gentoo-x86
  $ find . | cvs-fast-export -A /path/to/authormap.txt -l /path/to/gentoo-x86-export.log -p >/path/to/gentoo-x86-export.out

This will run for some time (8 hours on i7-8700), mostly as a single
thread, and produce a 21 GiB output file.

The CVS repository contains a package app-backup/Attic, which confuses
cvs-fast-export: "Files in CVS Attic and RCS directories are treated
as though the 'Attic/' or 'RCS/' portion of the path were absent."
This can be seen in the output file (note that the ``Attic`` path
component is missing)::

  ----------------------------------------------------------------------
  commit refs/heads/master
  mark :5149424
  committer Hanno Böck <hanno@gentoo.org> 1431281161 +0000
  data 118
  Initial commit of Attic

  (Portage version: 2.2.18/cvs/Linux x86_64, signed Manifest commit with key A5880072BBB51E42)

  from :5149420
  M 100644 :5149421 app-backup/Attic-0.15.ebuild
  M 100644 :5149422 app-backup/ChangeLog
  M 100644 :5149423 app-backup/metadata.xml
  ----------------------------------------------------------------------

  ----------------------------------------------------------------------
  commit refs/heads/master
  mark :5149426
  committer Hanno Böck <hanno@gentoo.org> 1431281167 +0000
  data 118
  Initial commit of Attic

  (Portage version: 2.2.18/cvs/Linux x86_64, signed Manifest commit with key A5880072BBB51E42)

  from :5149424
  M 100644 :5149425 app-backup/Manifest
  ----------------------------------------------------------------------

This is fixed by an additional sed filter in the following step.


Import into Git
===============
::

  $ mkdir gentoo-x86-git
  $ cd gentoo-x86-git
  $ git init
  $ LC_ALL=C sed '/^Initial commit of Attic$/,/^M [0-7]\{6\} .* app-backup\/Manifest/{s:^\(M [0-7]\{6\} .* app-backup/\)\(.*\):\1Attic/\2:}' \
  ../../var/cvsroot/gentoo-x86-export.txt | git fast-import


Differences to the old conversion
=================================

- cvs-fast-export(1) says:

    "A set of file operations is coalesced into a changeset if either
    (a) they all share the same commitid, or (b) all have no commitid
    but identical change comments, authors, and modification dates
    within the window defined by the time-fuzz parameter."

  For our case this means that for commits after 2006-03-04T10:23:03Z
  (commit 531f1a00a131) the commitid has been used to group them
  together, while earlier ones have been grouped by authors and commit
  messages, within a 5 minutes time window (which is the default
  for the fuzz parameter).

  This results in a total of 1688447 commits in the master branch,
  while the old conversion has only 788893 commits. Most of the
  difference can be explained by the fact that ``repoman commit``
  actually did two CVS commits, the second one for the Manifest to
  catch up with the updated $Header$ keywords. Since this reflects
  the actual workflow, no attempts have been made to squash these
  pairs of commits.

- The new conversion has a complete author map, previously users
  cbrannon, jerrya, luke-jr, and uid2214 (darkside) were missing.

- Commit messages have been left alone. For example, no conversion
  to Git footer lines has taken place. Conversion of character sets
  wasn't attempted either. (There are 310 commit messages with
  non-UTF-8 characters. About 80% of them appear to be latin-1,
  but the rest is something else, or just contains some garbage
  characters.)

- Category app-backup is now there.

- File sci-libs/qfits/Manifest in HEAD differs. The new conversion
  agrees with the last CVS checkout.

- The new conversion has a .gitignore file in its top-level directory.
  Also metadata/.cvsignore was renamed to metadata/.gitignore
  (cvs-fast-export does this automatically).

- Output of ``diff -qr --exclude=.git`` between tips of old and new
  repo::

    Only in gentoo-x86-git: .gitignore
    Only in gentoo-x86-git: app-backup
    Files historical/header.txt and gentoo-x86-git/header.txt differ
    Only in historical/metadata: .cvsignore
    Only in gentoo-x86-git/metadata: .gitignore
    Files historical/sci-libs/qfits/Manifest and gentoo-x86-git/sci-libs/qfits/Manifest differ


Notes
=====

Keyword expansion
-----------------

Although the man page of cvs-fast-export (version 1.57) says that the
program "does the equivalent of cvs -kb when checking out masters, not
performing any $-keyword expansion at all", it actually does expand
$-keywords.

For the tip of the trunk, expanded keywords appear to be correct,
as can be verified with Manifest checksums. This is not always true
earlier in history. For example, the CVS repository was located in
/home/cvsroot and moved to /var/cvsroot later (``$Header$`` lines
suggest that this move happened in early 2004). Also it is known that
some files were moved in the raw repository. Expanded keywords from
before such a move won't match.


Branch points
-------------

cvs-fast-export-1.57 gets confused about branch points, if a file
doesn't have any commits on the trunk that are newer than those on the
branch.

This triggers some warnings during conversion::

  cvs-fast-export: warning - non-vendor ./app-admin/analog/files/analog.cfg,v branch RELEASE-1_4 has no parent
  [and many more of the same type]

  cvs-fast-export: warning - branch point import-1.1.1 -> master later than branch
  cvs-fast-export:        trunk(85563):  2005-11-30T09:36:17Z  en.txt 1.1
  cvs-fast-export:        branch(85563): 2005-11-30T09:38:30Z  app-accessibility/SphinxTrain/files/digest-SphinxTrain-0.9.1-r1 1.1

It also results in commits from the branch showing up in the converted
Git master branch. The problem has been `reported upstream`__.

For the time being, this is worked around by adding an extra commit to
the trunk (and removing it from the converted repository later)::

  $ export CVSROOT=/var/cvsroot
  $ cvs checkout gentoo-x86
  $ cd gentoo-x86
  $ for file in $(find . -type d -name CVS -prune -o -type f -print); do echo >>${file}; done
  $ cvs commit -m "extra commit in trunk"

__ https://gitlab.com/esr/cvs-fast-export/-/issues/57


Missing app-games category
--------------------------

It is known that some files and directories have been moved, copied or
even deleted in the (server-side) RCS directory. This was advocated__
as late as 2005. For example, the whole ``app-games`` category was
deleted__ server-side at some time in late 2003 or early 2004, after
its packages had been moved to ``games-*`` categories.

Obviously, the history of these files is lost and there is no way for
the conversion to recover it.

__ https://archives.gentoo.org/gentoo-dev/message/029e91bdc515ddc5ae205b4694e00e91
__ https://archives.gentoo.org/gentoo-dev/message/ad7fa1ecae70e59d43ac70548076afcd


.. This work is licensed under the Creative Commons
   Attribution-ShareAlike 4.0 International License.
   https://creativecommons.org/licenses/by-sa/4.0/

.. Local Variables:
.. mode: rst
.. indent-tabs-mode: nil
.. End: