Classifying natural language requirements (NLRs) plays a crucial role in software engineering, helping us distinguish between functional and non-functional requirements. While large language models offer automation potential, we should address concerns about their consistency, meaning their ability to produce the same results over time. In this work, we share experiences from experimenting with how well GPT-4o and LLAMA3.3-70B classify NLRs using a zero-shot learning approach. Moreover, we explore how the temperature parameter influences classification performance and consistency for these models. Our results show that large language models like GPT-4o and LLAMA3.3- 70B can support automated NLRs classification. GPT-4o performs well in identifying functional requirements, with the highest consistency occurring at a temperature setting of one. Additionally, non-functional requirements classification improves at higher temperatures, indicating a trade-off between determinism and adaptability. LLAMA3.3-70B is more consistent than GPT-4o, and its classification accuracy varies less depending on temperature adjustments.