I work a lot with DFSr because we use it to keep some web farm replicated and some of our customer’s private farms. I can tell you it sucks, it always breaks, and it’s very hard to maintain. Although I’ll caveat that by saying we probably shouldn’t use it for web farms with millions of little files. Seems to work fine for AD. Anyway, this is the most common issue you will run into with DFSr, the unexpected crash or shut down. Both the nodes this occurred on did not crash, in fact they didn’t even reboot or shut down. But that doesn’t matter, DFSr still crashed. Below is just one example and the fix for it. It’s obvious from the event what you need to do, but lets review anyway.
The one thing you HAVE to remember is to leave it alone. Do not touch it after you resume replication. That’s the #1 mistake I see people making with troubleshooting DFSr. Either rebooting the server or restarting the server. DFSr keeps a journal (database) of all the changes to the replicated folders. You can’t just restart the service or reboot the server to fix this. That’s like trying to restart SQL to recover a corrupted database. Instead you need to recover that journal, which fortunately Microsoft tells you exactly how to do in the event log.
To get to the event log go to Control Panel –> Administrative Tools –> Event Viewer –> Applications and Services Logs –> DFS Replication.
Event ID?2213
The DFS Replication service stopped replication on volume C:. This occurs when a DFSR JET database is not shut down cleanly and Auto Recovery is disabled. To resolve this issue, back up the files in the affected replicated folders, and then use the ResumeReplication WMI method to resume replication.
Recovery Steps
1. Back up the files in all replicated folders on the volume. Failure to do so may result in data loss due to unexpected conflict resolution during the recovery of the replicated folders.
2. To resume the replication for this volume, use the WMI method ResumeReplication of the DfsrVolumeConfig class. For example, from an elevated command prompt, type the following command:
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=”32A74A78-0B49-11E2-93EE-806E6F6E6963″ call ResumeReplication
You will need to run the command given in step two from the event in command prompt as administrator to resume replication. Remember that each node in the DFSr replication group has a different GUID. Get the command from event viewer on each node and run it. Example below.
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=”32A74A78-0B49-11E2-93EE-806E6F6E6963″ call ResumeReplication
After you run it you will see Event ID 2212 in the log.
The DFS Replication service has detected an unexpected shutdown on volume C:. This can occur if the service terminated abnormally (due to a power loss, for example) or an error occurred on the volume. The service has automatically initiated a recovery process. The service will rebuild the database if it determines it cannot reliably recover. No user action is required.
You may also see Event ID 2218
The DFS Replication service is in the second step of replication database consistency checks after an unexpected shutdown. The database will be rebuilt if it cannot be recovered. No user action is required.
Now you just need to wait for the database to recover. Depending on the amount of files and how long it has been down for it can take a few minutes, several hours, or even days. You MUST leave it alone. Do not reboot the server or restart DFSr. That will simply start the process all over again.
Once it is fully recovered you will see event ID 2214.
The DFS Replication service successfully recovered from an unexpected shutdown on volume C:.This can occur if the service terminated abnormally (due to a power loss, for example) or an error occurred on the volume. No user action is required.
Once you see that event you are good to go. More info in this MS KB.
You may also want to see this list of hotfixes for DFSr for Windows 2008 and 2008 R2.
14 Trackbacks