We construct an improved shear wave velocity (Vs) model of the southern Californian crust and uppermost mantle by performing an adjoint tomographic inversion using Rayleigh wave empirical Green’s functions (EGFs) at 5–50 s periods from ambient noise cross correlations. Our initial model is the isotropic Vs model M16 from Tape et al., which was generated by three-component seismograms at 2–30 s periods from local earthquake data. Synthetic Green’s functions (SGFs) from M16 show good agreement with the EGFs at 5–10 and 10–20 s period bands, but they have an average 2.1 s time advance at 20–50 s. By minimizing the traveltime differences between the EGFs and SGFs using a gradient-based algorithm, we successively refine the Vs model, and the total misfit is reduced by ∼76.6 per cent from 1.75 to 0.41 after five iterations. Relative to M16, our new Vs model reveals: (1) a lower crust (20–30 km) with the mean Vs about 6 per cent slower; (2) a faster Vs speed in the middle and lower crust at depths greater than 10 km in the regions beneath the Los Angeles Basin and Central Transverse Range; (3) higher Vs in the lower crust beneath the westernmost Peninsular Range Batholith (PRB); and an enhanced high-velocity zone in the middle crust beneath Salton Trough Basin. Our updated model also reveals refined lateral velocity gradients across PRB, Sierra Nevada Batholith and San Andreas Fault. Our study demonstrates the improvement of lateral coverage and depth sensitivity from using ambient noise instead of only earthquake data. The numerical spectral-element solver used in adjoint tomography provides accurate structural sensitivity kernels, and hence generates more robust images than those by traditional ambient noise tomography based on ray theory.