-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent finite-difference and adjoint gradients in EigenmodeCoefficient #2087
Comments
What happens if you place PML all around? |
There's certainly a bug here, but there are also some issues with your test:
Actually, your original example is using metallic boundaries (you don't specify a
Your sources and monitor should stretch into the PML in this case (and you probably want to set Can you force all your simulations (forward, adjoint, and finite difference checks) to run for at least 1500 meep time units? If this works, this will tell us what part of #1855 is acting flaky. Note you can use this parameter in the
|
Ok good to know. So something is up with the recombination step, not the dft convergence. |
Do you have the same issue if you use |
I replace
With the old version of Meep, the simulation does not stop during In a previous example for |
In the previous example with
The discrepancy becomes more obvious when
|
I am also seeing a large mismatch in the adjoint and finite-difference gradients via the directional derivative in a random direction for a 3D test problem. (This test is planned to be used as a unit test for the adjoint solver after #2127 is merged.) The test involves an Following a suggestion from @smartalecH, I made sure to choose a small
import meep as mp
import meep.adjoint as mpa
import math
import numpy as np
from autograd import numpy as npa
from autograd import tensor_jacobian_product
resolution = 80 # pixels/μm
ng = 2.5
glass = mp.Medium(index=ng)
theta_d = math.radians(40.0)
wvl = 0.5 # wavelength
fcen = 1/wvl
sx = wvl/math.sin(theta_d) # period in x
sy = 0.5*wvl # period in y
m = 1 # diffraction order in x direction
# wavevector of transmitted order in substrate in xz plane
kdiff = mp.Vector3(m/sx,0,((fcen*ng)**2-(m/sx)**2)**0.5)
dpml = 1.0 # PML thickness
dsub = 3.0 # substrate thickness
dair = 3.0 # air padding
hcyl = 0.2 # height of cylinder
rcyl = 0.1 # radius of cylinder
sz = dpml+dair+hcyl+dsub+dpml
cell_size = mp.Vector3(sx,sy,sz)
design_region_size = mp.Vector3(sx,sy,0)
design_region_resolution = int(2*resolution)
Nx = int(design_region_size.x*design_region_resolution)
Ny = int(design_region_size.y*design_region_resolution)
boundary_layers = [mp.PML(thickness=dpml,direction=mp.Z)]
# periodic boundary conditions
k_point = mp.Vector3()
# planewave at normal incidence in +z direction
# plane of incidence: XZ
# polarization: (1) S/TE: Ey or (2) P/TM: Ex
src_cmpt = mp.Ex
src_pt = mp.Vector3(0,0,-0.5*sz+dpml)
sources = [mp.Source(src=mp.GaussianSource(fcen,fwidth=0.2*fcen),
size=mp.Vector3(sx,sy,0),
center=src_pt,
component=src_cmpt)]
substrate = [mp.Block(size=mp.Vector3(mp.inf,mp.inf,dpml+dsub),
center=mp.Vector3(0,0,0.5*sz-0.5*(dpml+dsub)),
material=glass)]
if src_cmpt == mp.Ey:
mode_parity = mp.ODD_Y
symmetries = [mp.Mirror(direction=mp.Y,phase=-1)]
elif src_cmpt == mp.Ex:
mode_parity = mp.EVEN_Y
symmetries = [mp.Mirror(direction=mp.Y)]
else:
mode_parity = mp.NO_PARITY
symmetries = []
def adjoint_solver(design_params, need_gradient=True):
matgrid = mp.MaterialGrid(mp.Vector3(Nx,Ny),
mp.air,
glass,
weights=np.ones((Nx,Ny)))
matgrid_region = mpa.DesignRegion(matgrid,
volume=mp.Volume(center=mp.Vector3(0,0,0.5*sz-dpml-dsub-0.5*hcyl),
size=mp.Vector3(design_region_size.x,
design_region_size.y,
hcyl)))
matgrid_geometry = [mp.Block(center=matgrid_region.center,
size=matgrid_region.size,
material=matgrid)]
geometry = substrate + matgrid_geometry
sim = mp.Simulation(resolution=resolution,
cell_size=cell_size,
sources=sources,
geometry=geometry,
boundary_layers=boundary_layers,
k_point=k_point,
symmetries=symmetries,
eps_averaging=False)
obj_list = [mpa.EigenmodeCoefficient(sim,
mp.Volume(center=mp.Vector3(0,0,0.5*sz-dpml),
size=mp.Vector3(sx,sy,0)),
1,
eig_parity=mode_parity,
kpoint_func=lambda *not_used: kdiff)]
# objective function
def J(mode_coeff):
return npa.power(npa.abs(mode_coeff),2)
opt = mpa.OptimizationProblem(simulation=sim,
objective_functions=J,
objective_arguments=obj_list,
design_regions=[matgrid_region],
frequencies=[fcen],
decay_by=1e-12)
if need_gradient:
f, dJ_du = opt([design_params])
return f, dJ_du
else:
f = opt([design_params], need_gradient=False)
if __name__ == '__main__':
xcoord = np.linspace(-0.5*design_region_size.x,0.5*design_region_size.x,Nx)
ycoord = np.linspace(-0.5*design_region_size.y,0.5*design_region_size.y,Ny)
xv, yv = np.meshgrid(xcoord,ycoord)
# cylindrical grating
p = np.where(np.sqrt(np.square(xv) + np.square(yv)) < rcyl, 0.9, 0.)
p = p.flatten(order='F')
# ensure reproducible results
rng = np.random.RandomState(9861548)
# random epsilon perturbation for design region
deps = 1e-5
dp = deps*rng.rand(Nx*Ny)
# compute objective value and gradient for unperturbed design
unperturbed_val, unperturbed_grad = adjoint_solver(p)
# compute objective value for perturbed design
perturbed_val = adjoint_solver(p+dp, False)
# compare directional derivative
if unperturbed_grad.ndim < 2:
unperturbed_grad = np.expand_dims(unperturbed_grad,axis=1)
adj_dd = (dp[None,:]@unperturbed_grad).flatten()
fd_dd = perturbed_val-unperturbed_val
print("directional derivative:, {} (adjoint solver), {} (finite difference)".format(adj_dd[0],fd_dd[0])) |
TLDR: I do not think there is much we can do here until #1951 is merged. I think the main reason these errors in the adjoint gradient have thus far gone undetected (regardless of the type and dimensionality of the test problem) is because all of the unit tests in Recall that to get a gradient test of a binarized design in #1854 to show agreement between the adjoint solver and the finite difference, we had to do two things: (1) apply a conic filter to the original ring-resonator design and (2) use #1951 (which has yet to be merged) in order to turn on subpixel smoothing of the |
Two types of problems are presented in this Issue. However, in @oskooi 's example, when I remove the |
Actually, it looks like I misspoke. If the random design region in
|
That test is for the |
That's correct: the current unit test involving However, I just modified meep/python/tests/test_mode_decomposition.py Lines 323 to 327 in 0f88943
Can you rerun your original test while doubling the resolution? Check if the difference in the directional derivative is reduced compare to the lower-resolution run. I suspect there is probably no bug here after all. |
I doubled the resolution in my previous test. It did not help but made things worse. I just increased the resolution further to
The mismatch is not reduced compared with the case of |
There is way too much going on here. Unfortunately, we keep posting test cases that don't clearly highlight a particular piece of the adjoint code that could have a bug. Instead, all of the test cases above involve multiple features/aspects of our increasingly complex adjoint solver, which don't really reveal anything new... I'm going to do my best to parse through everything and outline a plan of action. Potential bugs introduced by #1855This was a big PR that made two important changes: (1) it localized the computation of the adjoint gradient to each processor owning the respective DFT chunks; (2) it expanded the default size of adjoint DFT chunks to include an additional halo needed for localized gradient computation. There are many ways this could have introduced a bug. For example, we had to modify the DFT convergence criteria to only check "non-halo" points within each chunk (see here). So some of the convergence issues described above could be due to this... (although we somewhat investigated this already at the beginning of this issue...) Similarly, the "recombination step" requires us to restrict and interpolate over the Yee grid throughout our chunk. This means we sometimes need to pull points outside of the chunk, which hopefully refers to points within the halo. This halo is not well-defined for chunks on the edge of a cell with periodic boundary conditions. More importantly, meep lacks good data structures to check if we are accessing invalid elements of the array, and we have no good error checking within this piece of the code (see here). We should resolve any potential issues here first before fixing any other bugs that also existed before we merged #1855. Potential problems due to periodic boundary conditionsProperly implementing the adjoint derivative with Bloch boundary conditions is rather hairy for two reasons: (1) we want to use the same solver for both runs, which means we need to somehow fake Lorentz reciprocity by modifying the setup of the adjoint run; (2) the maxwell operator is modified to incorporate the Bloch boundary conditions, and this modified operator needs to be factored into the Let me expand a bit. The traditional (continuous-variable) Maxwell operator is of the form To use the same Maxwell solver for the forward and adjoint runs, we have to show that When we use Bloch periodic boundary conditions, we assume the fields are of a particular form, and the operator now becomes (See ch. 3 in Molding the Flow of Light): This extra (note that with cylindrical symmetries, we have a similar factor Furthermore, we need to be careful with how we recombine the We should look closely at these issues after we debug the potential issues introduced by #1855. My suggestion would be to first check the halo is what we want it to be, and that the restriction and interpolation steps never try to reach points they shouldn't. |
@oskooi, I would start with Wenchao's test and make sure you can reproduce it. Then you can modify it in various ways and see if the problem goes away (e.g. shrink the design region so that it doesn't touch the edge of the cell). |
Apart from the potential problems due to #1855 and periodic boundary conditions as summarized by @smartalecH, another problem may exist.
|
@mawc2019 can you run the same test as above, but limit the design region's width such that it does not touch the boundaries? |
In the example just above, the monitor width is decreased from |
Sorry -- can you keep the monitors the same size, but reduce the size of the design region? |
Sorry, the tests for shrinking design regions are here. In both examples, after
In both examples, after In both cases, finite-difference and adjoint gradients nearly diverge. In this modified original example, the comparison is as follows.
In this modified 90°-rotated example, the comparison is as follows.
Unlike the cases without divergence where the adjoint gradients change obviously when the structure is rotated by 90°, in the cases where divergence happens, the relative change in the adjoint gradients is negligible when the structure is rotated by 90°. |
Another feature of the incorrect For the |
can I ask do you have a plan of when are you going to check this issue? Thank you :) |
@mochen4 what if we tackle this issue the same way we did in #2193? Specifically, can you run @mawc2019's example above before and after #1855 and ensure that both the forward and adjoint fields remain the same? |
I ran that code after #1855 and before #1855 (using #1959). Both the forward fields and adjoint fields actually agreed before and after #1855. On the other hand, both adjoint gradients and fd_gradients changed, with the fd_gradient after #1855 not making sense. Meanwhile, #1855 actually solved a potential issue. Before #1855, the gradient loop went over points with y from -0.5 to 0.5; but the field values actually corresponded to points from y=-0.45 to y=0.5. Thus, throughout the loop, the points locations were "out of sync" with field values; and eventually the loop went through all fields first, and started to have garbage values for the remaining locations. After #1855, the points locations started at y=-0.45, and the loop went through fields and locations in sync. |
Great! We can confirm that even after #1855, the adjoint sources themselves are still fine (as expected) and that the dft convergence is probably fine.
Woohoo! This makes sense as we are now using the same looping macros that the DFT fields use to update.
Can you elaborate on this point? Is it possible that the adjoint gradients themselves are fine, but the "ground truth" we've been using (i.e. the finite differences) is somehow flawed (have we recently tried different step sizes or a Richardson extrapolation)? Even if there is still a bug with the adjoint routine, at least we've narrowed it down to the following two things: (1) we may not be handling the boundaries of the design region properly (particularly if we are next to a metallic or periodic boundary condition); (2) there may be a bug in the |
It's probably easier just to show the values: Before #1855:
After #1855:
The finite difference gradients are definitely wrong after #1855. The adjoint gradients, is also incorrect but within the same order of magnitude (in fact, roughly factor of 2 as the fd_gradient before #1855). By the way, after #1855, the simulations took significantly longer than before. So I changed the stopping criteria and saw no significant changes compared to the values reported above. |
EigenmodeCoefficient
adjoint gradients are consistent with finite-difference gradients when I use a version before PR #1855, but sometimes inconsistent when I use a later version, such as the version at PR #1855 or the current version. This problem does not seem the same as that in Issue #2060 although they might be related. An example is as follows, whichhas periodic boundary condition on the left and right sides buthas no waveguides:Before PR #1855, the comparison is
With the current version, the comparison is
The code is as follows.
The text was updated successfully, but these errors were encountered: