Job Description
You will serve as a subject‑matter expert (SME) providing Level‑3 technical support across Google Cloud’s AI/ML portfolio, with emphasis on Vertex AI, GenAI, Conversational AI, and Other AI services. The role centers on rapid, high‑quality incident response, root‑cause diagnosis, and resolution for complex customer cases—while maintaining SLOs, CSAT targets, and rigorous documentation standards across phone, email, and chat channels.
Key Responsibilities
- Own complex incidents end‑to‑end: triage, reproduce, diagnose, and resolve issues for AI/ML products; maintain transparent customer communication and accurate case records.
- Response, diagnosis, resolution and tracking by phone, email and chat of customer support queries.
- Maintain response and resolution speed as defined by SLOs.
- Keep high customer satisfaction scores and follow quality standards in 90% of cases.
- Assist and respond to consults from other technical support representatives through existing systems and tools.
- Use existing troubleshooting tools and techniques to establish root cause for queries and provide a customer facing root cause assessment.
- Understand business impact of customer issue reports and follow internal issue prioritization guidelines, provide justification on priority for a given single customer report.
- Perform internal classification queries documenting classes of problems and preventative actions for further retroactive analysis.
- Reactively (e.g. as a result of a query) file issue reports to Google engineers, collaborate with Google engineers to diagnose customer issues, build documentation, procedures, document desired behavior and/or steps to reproduce, and suggest code-level resolutions for complex product bugs, assist engineers to drive bugs to resolution.
- Perform community management tasks as needed by the business.
- Promptly and independently resolve technical incidents and escalations, with effective communication to all stakeholders internally and externally, so that no monitoring is needed by Google engineers.
- Take cases involving customer-specific requirements on architectural design, provide solutions limited to a particular product (or a subset of product features).
- Community contributions: solutions posts, FAQs, and guidance on best practices for AI/ML deployments and responsible AI usage.
Product Scope & Typical Case Patterns
Vertex AI
- Introduction/AutoML: dataset ingestion, labeling, AutoML training failures, metric drift, imbalance handling.
- Notebooks: environment provisioning, dependency/runtime conflicts, GPU/TPU access, kernel issues.
- AI Vector Search: index build latency, recall/precision tuning, ANN configuration, embedding mismatches.
- Pipelines: DAG orchestration failures, component contract issues, artifact lineage, caching.
- Prediction (Online/Batch): endpoint scaling, model versioning, cold‑start latency, batch job retries.
- Training: hyperparameter tuning, distributed training, accelerator utilization, checkpointing.
- Model Registry: version promotion policies, metadata integrity, rollback flows.
- Managed Datasets: schema evolution, governance, access control.
- Explainable AI: feature attributions, baselines, compliance requests.
- Feature Store: ingestion latency, online/offline store consistency, backfills.
GenAI
- LLMs & GenAI Introduction: prompt engineering pitfalls, safety filters, quota/latency.
- Vertex AI Gemini: model selection, context window sizing, tool‑use function calling, grounding.
- Vertex AI Search & Conversation: data connectors, retrieval quality, schema/FAQ ingestion.
- Discovery AI Retail Search: relevance tuning, synonym/attribute mapping, cold‑start catalogue issues.
- Vertex Gen AI Studio: prototype to production handoff, evaluation harnesses.
- Vertex Model Garden: model availability, versioning, licenses, tuning envelopes.
Conversational AI
- Dialogflow ES/CX: intent/flow design, session state, webhook reliability, NLU regression.
- CCAI Platform / CCaaS: telephony integration, routing, agent desktop, compliance.
- CCAI Insights: transcript accuracy, sentiment, redaction, analytics pipelines.
- Contact Center AI (General): deployment patterns, multichannel orchestration.
- Speech‑to‑Text / Text‑to‑Speech: language/acoustic models, latency, accuracy, voice settings.
- Agent Assist: suggestion quality, knowledge base integration, real‑time performance.
Other AI
- Healthcare Data Engine (HDE): FHIR mapping, interoperability, privacy controls.
- Document AI: processor selection, field extraction accuracy, batch throughput.
- Vision API: model outputs, rate limits, edge cases, dataset curation.