Announcement

Collapse
No announcement yet.

FSFS architecture

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • FSFS architecture

    Hi,

    I apologize if there is a similar top already posted, as I searched the web for a good explanation of FSFS and revision formats without much success.

    The scenario is that I am maintaining a mirror of actual repository on another distant server. I tried the svnsync method, which failed due to some very large commits and a slow, patchy network that often broke. So I used rsync to copy the very large revisions from /repo/db/revs/ to the mirrored location, then resumed the svnsync after setting the last-synced-revision property and current files, and set a post-commit hook to automate future commits to sync the mirror repository. The problem is that sometimes, this fails giving the error as seen in httpd error log :

    Can't read length line in file '/path/to/mirror-repo/db/revs/shard/rev-file' [500, #200002]

    When I look up the same revision in source repo, it has a different size and checksum. Now, some of the revision files in the mirror repo are identical in both size and checksum, while some have identical size, but different checksum , while in some revisions have both different size and checksum. Since svnsync simply re-creates the diff and applies the diff, the resulting mirrored revision should also be identical , but I find it sometime is and sometimes isn't and , some of the times when it isn't identical , it results in the error as above , which I so far fix by manually copying the same revision from source and over-writing the mirror repository revision file.

    The versions on the two servers are differed : 1.6.16 and 1.6.13 but I expect that itself is not the issue. Both are set to the same repository format.
    I have set the txn-current to match the source as well (initially I found they were different) , and the txn id is used in the revision files which I found sometimes, causing the checksum differences. Is the "rep-cache.db" the reason why subsequent commits in the mirror are not identical to the source ?

    Is there anywhere I can find the format used to create the revision files themselves, which I find are not fully consistent - some start with something like :

    id : 9-12345.2-14567.r15987/234
    type: file
    pred: 9-12345.2-14567.r14921/123 ....
    .

    but some other revision numbers start with other content like :

    DELTA 10234 097 1589
    SVN ^A^ <99>% ..................


    I haven't obviously looked at all revisions, but I could decipher no clear formar or cause to why the mirror repository revision files are not all identical.

    TIA,
    Rahul

  • #2
    The structure of the FSFS files is described in:
    http://svn.apache.org/repos/asf/subv...s_fs/structure

    Files will differ between the two repositories for several reasons: the most common is that every commit attempt causes the repository sequence number to be incremented. Failed commits (conflicts, out-of-date, etc) don't result in a change to the HEAD revision but do increase the sequence number. The sequence number is part of the node-revision-ids that appear in the rev files. Failed commits generally only affect one of the repositories so different sequence numbers lead to different rev files.

    Differences can also occur if rep-sharing is enabled in one repository and not the other.

    Comment


    • #3
      It's failing because you used rsync. AFAIK it's not possible to use rsync to synchronize a repository properly. If svnsync doesn't work for you (it should be able to pick up where it left off, allowing you to resume your broken sync), there are commercial products which will probably work better and can properly handle the data.

      Comment


      • #4
        It's simple to rsync an offline repository, it's just a tree of files. There are several pitfalls that make rsyncing a live repository hard: the rep-cache.db file, the locks directory, etc.

        Comment


        • #5
          Assuming all your library versions match on both systems, perhaps.

          Comment


          • #6
            @Philip : thanks for the URL, I'll go through it too see if it has the answers I seek.
            Both repositories are separate with no sharing between any other. I realized the failed commit part, which is why I reset the transaction ID in txn-current to match the source, but even then there have been a few revisions with differing sizes with no failed commit in between the last sync-commit which gave an identical checksum and the non-identical one.

            Does the UUID and rep-cache.db play a role in generating txn id or the revision hash (based on UUID or date/timestamp) ? I figure not but I have also set the same UUID on the mirror repo, if it was date/timestamp all the revisions would always differ in the checksum/hash, but the pattern I see is not consistent. I'll read up on the link you put above.

            @Andyl : no I'm not using rsync per se ( it would work , depending on which directories are synced as Philip mentioned but svnsync seems more bandwidth efficient), I used rsync to copy large revisions which failed to go across due to network breaks. After copying the rsynced revision, I reset the properties to resume via svnsync and it works for most part.

            Comment


            • #7
              Ok, so I read up the linked page once in its entirety, and realize I need to read it several times to properly understand it.
              In the meantime I also did this :
              I took an existing repository say "repo" and copied it as it is as say "clone_1" . Then edited the properties for svn:syn* , made some commits on original "repo" , ran svnsync and "clone_1" updates successfully, with identical revisions. I also created "clone_2" as empty, changed the UUID to that of "repo", then svnsynced it. The sync worked, but revisions were not identical for some reason. I deleted "clone_2", recreated it the same way, changed to same UUID and synced again. Again the revisions differed from "repo" , but were identical "clone_2" the first time (before delete). I notice that "clone_1" , after the follow up sync, retains identical revisions and rep-cache.db is also identical to "repo" , but "clone_2" has different revisions and rep_cache.db . I suspect now, if rep-cache.db is identical, the revisions will also be identical.

              I don't know if anyone can answer these , the answers to which I'm also trying to find by trial and error :
              1. Will rep-cache.db regenerate , if I delete it from a current repository ( I can try this on a test repo, perhaps will soon) ?
              2. Does rep-cache.db play a role in diff generation/revision file creation , as I suspect ?
              3. Does the UUID play any role in diff generation/revision file creation ? ( I can try this too on a test repository )
              3. Why does SVN recreate a diff (and hence revision file) differently each time ? Say If revisions 1 creates a empty directory trunk, 2 and 3 each change only 1 file each, but rev 4 and 5 add or modify multiple files each. Then I observe revisions 1,2,3 be played out identical each time , but (from a few trials I did) revisions 4 and 5 may differently depending which of the changed nodes/files is diffed first , second and so on . If a single revision changes 100 files, there could be so many permutations/combinations of diffs - it could mean several attempts to create sync repositories could still result in differing revision files each time. Functionally, each file checked out could be identical, but the revision file itself is not identical , and that reports problems sometimes when I run verify.

              Comment


              • #8
                Rep-caching can be switched on and off at any time. A missing rep-cache.db will get recreated but won't contain entries for existing revisions. So a new commit will not find matching representations in the cache and so will write all representations into the new revision rather than sharing any representations with old revisions. If rep-caching is enabled the new representations will be recorded in the cache and will be eligible for sharing by future revisions.

                UUID does not affect representations.

                Comment


                • #9
                  @Philip : thanks a lot. How is caching switched on or off ? svnadmin create does not have any option for this from all the available commands.
                  From the page link above - this is only referred to during write operations, so I understand that potentially it alters the way subsequent revisions are written. It doesn't say much more - like what exactly its effect is, from observation it can result in revisions having different sizes. I guess I must read that page several times over !

                  Comment


                  • #10
                    enable-rep-sharing in db/fsfs.conf

                    Comment


                    • #11
                      The line "enable-rep-sharing = false" is commented by default, but rep-cache.db is created for all repositories, so repository sharing must be enabled by default, correct ?
                      I will try another clone repository with this disabled.

                      I am trying to match the revision file contents with the description in fsfs structure page. However, the details aren't exactly clear, the revision and text diffs don't match.
                      More exploration due :-/

                      Comment


                      • #12
                        It's been a while since I checked this thread. So far svnsync has been working properly after fixing the last mismastch hash revision files manually, and have not had to do intervene since late October. As it stands, this is now on the back-burner unfortunately, I have other assignments to focus on ; but I will try and come back to fully understanding the SVN revision file format , that and how svnmerge works.

                        Comment

                        Working...
                        X