Abstract: Currently, image-based 3D reconstruction has demonstrated commendable progress in small-scale, uncomplicated scenes, yet it remains a substantial challenge when dealing with complex, large-scale building structures. Targeting the early stages of the Architecture, Engineering, and Construction (AEC) design scenario, we introduce ArchiDiff , a platform designed for 3D reconstruction and editing from images to point clouds in intricate architectural scenes. Firstly, we curated a comprehensive 3D reconstruction dataset, ArchiCloudNet, tailored specifically for complex architectural scenes. Secondly, we propose a 3D reconstruction method based on a conditional denoising diffusion model, bolstered by the integration of an arbitrary object segmentation model to enhance the segmentation and recognition capabilities in complex scenes. Finally, our framework incorporates an interactive feature enabling instantaneous editing of 2D images through a simple drag-and-drop operation, with simultaneous updates to 3D building point clouds. We evaluated the 3D reconstruction accuracy of ArchiDiff and compared it with cutting-edge baselines on ArchiCloudNet. Experimental results demonstrate that our model couldgenerate high-quality 3D point clouds, providing rapid-response editing and ef ective processing of complex backgrounds.