打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
walkthrough

A walkthrough of the basic features of git-annex.

creating a repository

This is very straightforward.

# mkdir ~/annex# cd ~/annex# git init# git annex init

adding a remote

Like any other git repository, git-annex repositories have remotes.Let's start by adding a USB drive as a remote.

# sudo mount /media/usb# cd /media/usb# git clone ~/annex# cd annex# git annex init "portable USB drive"# git remote add laptop ~/annex# cd ~/annex# git remote add usbdrive /media/usb/annex

This is all standard ad-hoc distributed git repository setup.

The only git-annex specific part is telling it a descriptionof the new repository created on the USB drive. This is optional, butgiving the repository a description helps when git-annex talks about itlater.

Notice that both repos are set up as remotes of one another. This letseither get annexed files from the other. You'll want to do that evenif you are using git in a more centralized fashion.

adding files

# cd ~/annex# cp /tmp/big_file .# cp /tmp/debian.iso .# git annex add .add big_file (checksum...) okadd debian.iso (checksum...) ok# git commit -a -m added

When you add a file to the annex and commit it, only a symlink tothe content is committed to git. The content itself is stored ingit-annex's backend, .git/annex/ (or in direct mode the fileis left as-is).

renaming files

# cd ~/annex# git mv big_file my_cool_big_file# mkdir iso# git mv debian.iso iso/# git commit -m moved

You can use any normal git operations to move files around, or evenmake copies or delete them.

Notice that, since annexed files are represented by symlinks,the symlink will break when the file is moved into a subdirectory.But, git-annex will fix this up for you when you commit --it has a pre-commit hook that watches for and corrects broken symlinks.

(Note that if a repository is in direct mode, you can't run normal gitcommands in it. Instead, just move the files using non-git commands, andgit annex add and git annex sync.)

getting file content

A repository does not always have all annexed file contents available.When you need the content of a file, you can use "git annex get" tomake it available.

We can use this to copy everything in the laptop's annex to theUSB drive.

# cd /media/usb/annex# git annex sync laptop# git annex get .get my_cool_big_file (from laptop...) okget iso/debian.iso (from laptop...) ok

syncing

Notice that in the previous example, git annexsync was used. This lets git-annex know what has changed in the otherrepositories like the laptop, and so it knows about the files present there and canget them.

Let's look at what the sync command does in more detail:

# cd /media/usb/annex# git annex synccommitnothing to commit (working directory clean)okpull laptopokpush laptopok

After you run sync, the git repository will be updated with all changesmade to its remotes, and any changes in the git repository will be pushedout to its remotes, where a sync will get them. This is especially usefulwhen using git in a distributed fashion, without a central barerepository. See sync fordetails.

By default git annex sync only syncs the metadata about yourfiles that is stored in git. It does not sync the contents of files, thatare managed by git-annex. To do that, you can use git annex sync --content

transferring files: When things go wrong

After a while, you'll have several annexes, with different file contents.You don't have to try to keep all that straight; git-annex doeslocation tracking for you. If you ask it to get a file and the driveor file server is not accessible, it will let you know what it needs to getit:

# git annex get video/hackity_hack_and_kaxxt.movget video/hackity_hack_and_kaxxt.mov (not available)  Unable to access these remotes: usbdrive, server  Try making some of these repositories available:    5863d8c0-d9a9-11df-adb2-af51e6559a49  -- my home file server    58d84e8a-d9ae-11df-a1aa-ab9aa8c00826  -- portable USB drive    ca20064c-dbb5-11df-b2fe-002170d25c55  -- backup SATA drivefailed# sudo mount /media/usb# git annex get video/hackity_hack_and_kaxxt.movget video/hackity_hack_and_kaxxt.mov (from usbdrive...) ok

removing files

When you're using git-annex you can git rm a file just like you usuallywould with git. Just like with git, this removes the file from your worktree, but it does not remove the file's content from the git repository.If you check the file back out, or revert the removal, you can get it back.

Git-annex adds the ability to remove the content of a file from your localrepository to save space. This is called "dropping" the file.

You can always drop files safely. Git-annex checks that some otherrepository still has the file before removing it.

# git annex drop iso/debian.isodrop iso/Debian_5.0.iso ok

Once dropped, the file will still appear in your work tree as a broken symlink.You can use git annex get to as usual to get this file back to your localrepository.

removing files: When things go wrong

Before dropping a file, git-annex wants to be able to look at otherremotes, and verify that they still have a file. After all, it couldhave been dropped from them too. If the remotes are not mounted/available,you'll see something like this.

# git annex drop important_file other.isodrop important_file (unsafe)  Could only verify the existence of 0 out of 1 necessary copies  Unable to access these remotes: usbdrive  Try making some of these repositories available:    58d84e8a-d9ae-11df-a1aa-ab9aa8c00826  -- portable USB drive    ca20064c-dbb5-11df-b2fe-002170d25c55  -- backup SATA drive  (Use --force to override this check, or adjust numcopies.)faileddrop other.iso (unsafe)  Could only verify the existence of 0 out of 1 necessary copies      No other repository is known to contain the file.  (Use --force to override this check, or adjust numcopies.)failed

Here you might --force it to drop important_file if you trust your backup.But other.iso looks to have never been copied to anywhere else, so ifit's something you want to hold onto, you'd need to transfer it tosome other repository before dropping it.

modifying annexed files

Normally, the content of files in the annex is prevented from being modified.(Unless your repository is using direct mode.)

That's a good thing, because it might be the only copy, you wouldn'twant to lose it in a fumblefingered mistake.

# echo oops > my_cool_big_filebash: my_cool_big_file: Permission denied

In order to modify a file, it should first be unlocked.

# git annex unlock my_cool_big_fileunlock my_cool_big_file (copying...) ok

That replaces the symlink that normally points at its content with a copyof the content. You can then modify the file like any regular file. Becauseit is a regular file.

(If you decide you don't need to modify the file after all, or want to discardmodifications, just use git annex lock.)

When you git commit it will notice that you are committing an unlockedfile, add its new content to the annex, and a pointer to that content iswhat gets committed to git.

# echo "now smaller, but even cooler" > my_cool_big_file# git commit my_cool_big_file -m "changed an annexed file"add my_cool_big_file ok[master 64cda67] changed an annexed file 1 files changed, 1 insertions(+), 1 deletions(-)

For more details on working with unlocked files vs the regular lockedfiles, see unlocked files.

using ssh remotes

So far in this walkthrough, git-annex has been used with a remoterepository on a USB drive. But it can also be used with a git remotethat is truly remote, a host accessed by ssh.

Say you have a desktop on the same network as your laptop and wantto clone the laptop's annex to it:

desktop# git clone ssh://mylaptop/home/me/annex ~/annexdesktop# cd ~/annexdesktop# git annex init "my desktop"

Now you can get files and they will be transferred (using rsync via ssh):

desktop# git annex get my_cool_big_fileget my_cool_big_file (getting UUID for origin...) (from origin...)SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e  100% 2159     2.1KB/s   00:00ok

When you drop files, git-annex will ssh over to the remote and makesure the file's content is still there before removing it locally:

desktop# git annex drop my_cool_big_filedrop my_cool_big_file (checking origin..) ok

Note that normally git-annex prefers to use non-ssh remotes, likea USB drive, before ssh remotes. They are assumed to be faster/cheaper toaccess, if available. There is a annex-cost setting you can configure in.git/config to adjust which repositories it prefers. Seethe man page for details.

Also, note that you need full shell access for this to work --git-annex needs to be able to ssh in and run commands. Or at least,your shell needs to be able to run the git-annex-shell command.

For details on setting up ssh remotes, see thecentralized git repository tutorial.

using special remotes

We've seen above that git-annex can be used to store files inregular git remotes, accessed either via ssh, or on a removable drive. Butgit-annex can also store files in Amazon S3, Glacier, on a rsync server, inWebDAV, or even pull files down from the web and bittorrent.This and much more is made possible by special remotes.

These are not normal git repositories; indeed the git repository is notstored on a special remote. But git-annex can store the contents of filesin special remotes, and operate on them much as it would on any otherremote. Bonus: Files stored on special remotes can easily beencrypted!

All you need to get started using a special remote is to initialize it.This is done using the git annex initremote command, which needs to bepassed different parameters depending on the type of special remote.

Some special remotes also need things like passwords to be set inenvironment variables. Don't worry -- it will prompt if you leave anything off.So feel free to make any kind of special remote instead of the S3 remoteused in this example.

# export AWS_ACCESS_KEY_ID="somethingotherthanthis"# export AWS_SECRET_ACCESS_KEY="s3kr1t"# git annex initremote mys3 type=S3 chunk=1MiB encryption=sharedinitremote mys3 (shared encryption) (checking bucket) (creating bucket in US) ok

Now you can store files on the newly initialized special remote.

# git annex copy my_cool_big_file --to mys3copy my_cool_big_file (to mys3...) ok

Once you've initialized a special remote in one repository, you can enableuse of the same special remote in other clones of the repository.If the mys3 remote above was initialized on your laptop, you'll also wantto enable it on your desktop.

To do so, first get git-annex in sync (so it knows aboutthe special remote that was added in the other repository), and thenuse git annex enableremote.

desktop# git annex syncdesktop# export AWS_ACCESS_KEY_ID="somethingotherthanthis"desktop# export AWS_SECRET_ACCESS_KEY="s3kr1t"desktop# git annex enableremote mys3enableremote mys3 (checking bucket) ok

And now you can download files from the special remote:

desktop# git annex get my_cool_big_file --from mys3get my_cool_big_file (from mys3...) ok

This has only scratched the surface of what can be done withspecial remotes.

moving file content between repositories

Often you will want to move some file contents from a repository to someother one. For example, your laptop's disk is getting full; time to movesome files to an external disk before moving another file from a fileserver to your laptop. Doing that by hand (by using git annex get andgit annex drop) is possible, but a bit of a pain. git annex movemakes it very easy.

# git annex move my_cool_big_file --to usbdrivemove my_cool_big_file (to usbdrive...) ok# git annex move video/hackity_hack_and_kaxxt.mov --from fileservermove video/hackity_hack_and_kaxxt.mov (from fileserver...)SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e   100%   82MB 199.1KB/s   07:02ok

quiet please: When git-annex seems to skip files

One behavior of git-annex is sometimes confusing at first, but it turns outto be useful once you get to know it.

# git annex drop *# 

Why didn't git-annex seem to do anything despite being asked to drop all thefiles? Because it checked them all, and none of them are present.

Most git-annex commands will behave this way when they're able to quicklycheck that nothing needs to be done about a file.

Running a git-annex command without specifying any file name willmake git-annex look for files in the current directory and itssubdirectories. So, we can add all new files to the annex easily:

# echo hi > subdir/subsubdir/newfile# git annex addadd subdir/subsubdir/newfile ok

When doing this kind of thing, having nothing shown for filesthat it doesn't need to act on is useful because it prevents swampingyou with output. You only see the files it finds it does need to act on.

So remember: If git-annex seems to not do anything when you tell it to, it'snot being lazy -- It's checked that nothing needs to be done to get to thestate you asked for!

using tags and branches

Like git, git-annex hangs on to every old version of a file (by default),so you can make tags and branches, and can check them out later to look atthe old files.

# git tag 1.0# rm -f my_cool_big_file# git commit -m deleted# git checkout 1.0# cat my_cool_big_fileyay! old version still here

Of course, when you git checkout an old branch, some old versions offiles may not be locally available, and may be stored in some otherrepository. You can use git annex get to get them as usual.

unused data

It's possible for data to accumulate in the annex that no files in anybranch point to anymore. One way it can happen is if you git rm a filewithout first calling git annex drop. And, when you modify an annexedfile, the old content of the file remains in the annex. Another way is whenmigrating between key-value backends.

This might be historical data you want to preserve, so git-annex defaults topreserving it. So from time to time, you may want to check for such data:

# git annex unusedunused . (checking for unused data...)   Some annexed data is no longer used by any files in the repository.    NUMBER  KEY    1       SHA256-s86050597--6ae2688bc533437766a48aa19f2c06be14d1bab9c70b468af445d4f07b65f41e    2       SHA1-s14--f1358ec1873d57350e3dc62054dc232bc93c2bd1  (To see where data was previously used, try: git log --stat -S'KEY')  (To remove unwanted data: git-annex dropunused NUMBER)ok

After running git annex unused, you can follow the instructions to examinethe history of files that used the data, and if you decide you don't need thatdata anymore, you can easily remove it from your local repository.

# git annex dropunused 1dropunused 1 ok

Hint: To drop a lot of unused data, use a command like this:

# git annex dropunused 1-1000

Rather than removing the data, you can instead send it to otherrepositories:

# git annex copy --unused --to backup# git annex move --unused --to archive

fsck: verifying your data

You can use the fsck subcommand to check for problems in your data. Whatcan be checked depends on the key-value backend you've usedfor the data. For example, when you use the SHA1 backend, fsck will verifythat the checksums of your files are good. Fsck also checks that thenumcopies setting is satisfied for all files.

# git annex fsckfsck some_file (checksum...) okfsck my_cool_big_file (checksum...) ok...

You can also specify the files to check. This is particularly useful ifyou're using sha1 and don't want to spend a long time checksumming everything.

# git annex fsck my_cool_big_filefsck my_cool_big_file (checksum...) ok

If you have a large repo, you may want to check it in smaller steps. You maystart and continue an aborted or time-limited check.

# git annex fsck -S <optional-directory> --time-limit=1mfsck some_file (checksum...) okfsck my_cool_big_file (checksum...) ok  Time limit (1m) reached!# git annex fsck -m <optional-directory>fsck my_other_big_file (checksum...) ok...

Use -S or --incremental to start the incremental check. Use -mor --more to continue the started check and continue where it leftoff. Note that saving the progress of fsck is performed after every1000 files or 5 minutes or when --time-limit occours. There may befiles that will be checked again when git-annex exists abnormallyeg. Ctrl+C and the check is restarted.

fsck: when things go wrong

Fsck never deletes possibly bad data; instead it will be moved to.git/annex/bad/ for you to recover. Here is a sample of what fsckmight say about a badly messed up annex:

# git annex fsckfsck my_cool_big_file (checksum...)git-annex: Bad file content; moved to .git/annex/bad/SHA1:7da006579dd64330eb2456001fd01948430572f2git-annex: ** No known copies exist of my_cool_big_filefailedfsck important_filegit-annex: Only 1 of 2 copies exist. Run git annex get somewhere else to back it up.failedgit-annex: 2 failed

backups

git-annex can be configured to require more than one copy of a file exists,as a simple backup for your data. This is controlled by thenumcopies setting, which defaults to 1 copy. Let'schange that to require 2 copies, and send a copy of every fileto a USB drive.

# git annex numcopies 2# git annex copy . --to usbdrive

Now when we try to git annex drop a file, it will verify that itknows of 2 other repositories that have a copy before removing itscontent from the current repository.

The numcopies setting used above is the global default.You can also vary the number of copies needed, depending on the file name.So, if you want 3 copies of all your flac files, but only 1 copy of oggs:

# echo "*.ogg annex.numcopies=1" >> .gitattributes# echo "*.flac annex.numcopies=3" >> .gitattributes

Or, you might want to make a directory for important stuff, and configureit so anything put in there is backed up more thoroughly:

# mkdir important_stuff# echo "* annex.numcopies=3" > important_stuff/.gitattributes

For more details about the numcopies setting, see copies.

automatically managing content

Once you have multiple repositories, and have perhaps configured numcopies,any given file can have many more copies than is needed, or perhaps fewerthan you would like. How to manage this?

The whereis subcommand can be used to see how many copies of a file are known,but then you have to decide what to get or drop. In this example, thereare perhaps not enough copies of the first file, and too many of the secondfile.

# cd /media/usbdrive# git annex whereiswhereis my_cool_big_file (1 copy)    0c443de8-e644-11df-acbf-f7cd7ca6210d  -- laptopwhereis other_file (3 copies)    0c443de8-e644-11df-acbf-f7cd7ca6210d  -- laptop    62b39bbe-4149-11e0-af01-bb89245a1e61  -- usb drive [here]    7570b02e-15e9-11e0-adf0-9f3f94cb2eaa  -- backup drive

What would be handy is some automated versions of get and drop, that onlygets a file if there are not yet enough copies of it, or only drops a fileif there are too many copies. Well, these exist, just use the --auto option.

# git annex get --auto --numcopies=2get my_cool_big_file (from laptop...) ok# git annex drop --auto --numcopies=2drop other_file ok

With two quick commands, git-annex was able to decide for you how towork toward having two copies of your files.

# git annex whereiswhereis my_cool_big_file (2 copies)    0c443de8-e644-11df-acbf-f7cd7ca6210d  -- laptop    62b39bbe-4149-11e0-af01-bb89245a1e61  -- usb drive [here]whereis other_file (2 copies)    0c443de8-e644-11df-acbf-f7cd7ca6210d  -- laptop    7570b02e-15e9-11e0-adf0-9f3f94cb2eaa  -- backup drive

The --auto option can also be used with the copy command,again this lets git-annex decide whether to actually copy content.

The above shows how to use --auto to manage content based on the numberof copies. It's also possible to configure, on a per-repository basis,which content is desired. Then --auto also takes that into accountsee preferred content for details.

more

So ends the walkthrough. By now you should be able to use git-annex.

Want more? See tips for lots more features and advice.

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
git(1)
git bundle手册
TortoiseSVN 操作指南
GitHub入门:如何上传与下载工程?
Rename A Gitosis Repository
git
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服