Skip to content

feat(operator): Add Case Sensitivity to Keyword Search Operator#5600

Open
SarahAsad23 wants to merge 3495 commits into
apache:mainfrom
SarahAsad23:sarah-keyword-search-case-sensitive
Open

feat(operator): Add Case Sensitivity to Keyword Search Operator#5600
SarahAsad23 wants to merge 3495 commits into
apache:mainfrom
SarahAsad23:sarah-keyword-search-case-sensitive

Conversation

@SarahAsad23

@SarahAsad23 SarahAsad23 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Supersedes #3510. Since the PR is old and the repository structure has changed, this PR reapplies the relevant changes on top of the current master branch.

This PR adds an option for case sensitivity to the keyword search operator. Users can now use a checkbox to specify whether their search should be case sensitive or case insensitive. This functionality is enabled through the addition of a CaseSensitiveAnalyzer that extends the base Lucene Analyzer for case sensitive searches, while the original StandardAnalyzer is used for case insensitive searches.

image image image

Any related issues, documentation, discussions?

Closes #3045.

How was this PR tested?

Tested Manually.

Was this PR authored or co-authored using generative AI tooling?

Co-authored using: Claude Code

GspikeHalo and others added 30 commits December 17, 2024 21:49
### Purpose:
Since "Hub" has been chosen as the more formal name for the community,
the previous name needs to be replaced with "Hub."

### Change:
The 'Community' in the left sidebar has been replaced with 'Hub'.

### Demo:
Before:

![image](https://github.com/user-attachments/assets/29604593-7453-43d7-ab80-1f37637752f7)

After:

![image](https://github.com/user-attachments/assets/ac41b971-9f10-4e35-b0a0-8ba37bd200a9)
…bles. (apache#3168)

### Purpose:
Although the column indicating publication already exists in both the
dataset and workflow tables, they have different names and types.

### Change:
Modified the dataset and workflow tables to unify the format of
is_public across different tables. **To complete the database update,
please execute `18.sql` located in the update folder.**
This PR unifies the design of the output mode, and make it a property on
an output port.

Previously we had two modes:
1. SET_SNAPSHOT (return a snapshot of a table)
2. SET_DELTA (return a delta of tuples) 

And different chart types (e.g., HTML, bar chart, line chart). However,
we are only using HTML type after switching to pyplot.

Additionally, the output mode and chart type are associated with a
logical operator, and passed along to the downstream sink operator, this
does not support multiple output ports operators.

### New design
1. Move OutputMode onto an output port's property.
2. Unify to three modes:
    a. SET_SNAPSHOT (return a snapshot of a table)
    b. SET_DELTA (return a delta of tuples)
    c. SINGLE_SNAPSHOT (only used for visualizations to return a html)
The SINGLE_SNAPSHOT is needed now as we need a way to differenciate a
HTML output vs a normal data table output. This is due to the storage
with mongo is limited by 16 mb, and HTMLs are usually larger than 16 mb.
After we remove this limitation on the storage, we will remove the
SINGLE_SNAPSHOT and fall back to SET_SNAPSHOT.
This PR refactors the handling of sink operators in Texera by removing
the sink descriptor and introducing a streamlined approach to creating
physical sink operators during compilation and scheduling. Additionally,
it shifts the storage assignment logic from the logical layer to the
physical layer.

1. **Sink Descriptor Removal:** Removed the sink descriptor, physical
sink operators are no longer created through descriptors. In the future,
we will remove physical sink operators.
2. **Sink Operator Creation:**
- Introduced a temporary factory for creating physical sink operators
without relying on a descriptor.
- Physical sink operators are now considered part of the sub-plan of
their upstream logical operator. For example: If the HashJoin logical
operator requires a sink, its physical sub-plan includes the building
physicalOp, probing physicalOp, and the sink physicalOp.
3. **Storage Assignment Refactor:**
- Merged the storage assignment logic into the physical layer, removing
it from the logical layer.
- When a physical sink operator is created (either during compilation or
scheduling), its associated storage is also created at the same moment.
This PR fixes the issue that, when MongoDB is used as the result
storage, the execution keeps failing.

The root cause is:

when using MongoDB as the result storage, the schema has to be extracted
from the physical op
the code implementation uses the physical ops in subPlan to extract the
schema out, which is incorrect as subPlan's physical ops are not
propagated for output schemas
How the fix is done is: using physicalPlan.getOperator, instead of
subPlan.getOperator. In this way, the operator that is flowing to the
downstream is from `physicalPlan`, not `subPlan`
This PR fixes an issue where operators marked for viewing results did
not display the expected results due to a mismatch caused by
string-based comparison instead of using operator identities. By
updating the matching logic to rely on operator identities, this change
ensures accurate identification of marked operators and correct display
of their results.
Schema propagation is now handled in the physical plan to ensure all
ports, both external and internal, have an associated schema. As a
result, schema propagation in the logical plan is no longer necessary.

In addition, workflow context was used to give user information so that
schema propagation can resolve file names. This is also no longer needed
as we are resolving files through an explicit call as a step of
compiling. WorkflowContext is no longer needed to be set to Logical
Operator.
The cache logic will be rewritten at the physical plan layer, using
ports as the caching unit. This version is being removed temporarily.
Previously, all results were tied to logical operators. This PR modifies
the engine to associate results with output ports, enabling better
granularity and support for operators with multiple outputs.

### Key Change: StorageKey with PortIdentity

The most significant update in this PR is the adjustment of the storage
key format to include both the logical operator ID and the port ID. This
ensures that logical operators with multiple output ports (e.g., Split)
can have distinct storages created for each output port.

For now, the frontend retrieves results from the default output port
(port 0). In future updates, the frontend will be enhanced to support
retrieving results from additional output ports, providing more
flexibility in how results are accessed and displayed.
There are a few operator descriptors written in Java, which makes them
difficult to use and maintain. This PR converts all such descriptors to
Scala to streamline the migration process to new APIs and facilitate
future work, such as operator offloading.

Changed Operators:
- PythonUDFSourceOpDescV2
- RUDFSourceOpDesc
- SentimentAnalysisOpDesc
- SpecializedFilterOpDesc
- TypeCastingOpDesc
…de-based IDEs (apache#3180)

Add Scala([Metals](https://github.com/scalameta/metals)) generated folders to gitignore to support VSCode and Cursor IDE. Metals is a popular plugin for VSCode-based IDE to develop scala applications.
This PR moves all definitions of protobuf under workflow-core to be
within the core package name.
The scalapb proto definition both present in workflow-core and amber.
This PR removes the second copy.
The Python protobuf-generated code was outdated. This PR updates the
generation script to include all protobuf definitions from the
workflow-core and amber sub-projects, ensuring that the latest Python
code is generated and aligned with the current protobuf definitions.
Previously, creating a physical operator during compilation also
required the creation of its corresponding executor instances. To delay
this process so that executor instances were created within workers, we
used a lambda function (in `OpExecInitInfo`). However, the lambda
approach had a critical limitation: it was not serializable and it is
language dependent.

This PR addresses this issue by replacing the lambda functions in
`OpExecInitInfo` with fully serializable Protobuf entities. The
serialized information now ensures compatibility with distributed
environments and is language-independent. Two primary types of
`OpExecInitInfo` are introduced:

1. **`OpExecWithClassName`**:
   - **Fields**: `className: String`, `descString: String`.
- **Behavior**: The language compiler dynamically loads the class
specified by `className` and uses `descString` as its initialization
argument.

2. **`OpExecWithCode`**:
   - **Fields**: `code: String`, `language: String`.
- **Behavior**: The language compiler compiles the provided `code` based
on the specified `language`. The arguments are already pre-populated
into the code string.

### Special Cases

The `ProgressiveSink` and `CacheSource` executors are treated as special
cases. These executors require additional unique information (e.g.,
`storageKey`, `workflowIdentity`, `outputMode`) to initialize their
executor instances. While this PR preserves the handling of these
special cases, these executors will eventually be refactored or removed
as part of the plan to move storage management to the port layer.
This PR improves exception handling in AsyncRPCServer to unwrap the
actual exception from InvocationTargetException.

Old:
<img width="889" alt="截屏2024-12-31 上午2 38 34"
src="https://github.com/user-attachments/assets/8ec40cce-1b7a-4ecc-8518-a67b7e79888b"
/>


New:
<img width="1055" alt="截屏2024-12-31 上午2 33 18"
src="https://github.com/user-attachments/assets/de1c3e42-0dbb-4dbe-8b76-dee486d5bbb9"
/>
This PR removes all schema propagation functions from the logical plan.
Developers are now required to implement `SchemaPropagationFunc`
directly within the PhysicalPlan. This ensures that each PhysicalOp has
its own distinct schema propagation logic, aligning schema handling more
closely with the execution layer.

To accommodate the need for schema propagation in the logical plan
(primarily for testing purposes), a new method,
`getExternalOutputSchemas`, has been introduced. This method facilitates
the propagation of schemas across all PhysicalOps within a logical
operator, ensuring compatibility with existing testing workflows.
Each port must have exactly one schema. If multiple links are connected
to the same port, they are required to share the same schema. This PR
introduces a validation step during schema propagation to ensure this
constraint is enforced as part of the compilation process.

For example, consider a Union operator with a single input port that
supports multiple links. If upstream operators produce differing output
schemas, the validation will fail with an appropriate error message:
![CleanShot 2024-12-31 at 14 56
36](https://github.com/user-attachments/assets/077594b4-26fb-4983-9ce4-d7c67365645e)
…he#3156)

#### This PR introduces the `CostEstimator` trait which estimates the
cost of a region, given some resource units.
- The cost estimator is used by `CostBasedScheduleGenerator` to
calculate the cost of a schedule during search.
- Currently we only consider one type of schedule for each region plan,
which is a total order of the regions. The cost of the schedule (and
also the cost of the region plan) is thus the summation of the cost of
each region.
- The resource units are currently passed as placeholders because we
assume a region will have all the resources when doing the estimation.
The units may be used in the future if we consider different methods of
schedule-generation. For example, if we allow two regions to run
concurrently, the units will be split in half for each region.

#### A `DefaultCostEstimator` implementation is also added, which uses
past execution statistics to estimate the wall-clock runtime of a
region:
- The runtime of each region is represented by the runtime of its
longest-running operator.
- The runtime of operators are estimated using the statistics from the
**latest successful execution** of the workflow.
- If such statistics do not exist (e.g., if it is the first execution,
or if past executions all failed), we fall back to using number of
materialized edges as the cost.
- Added test cases using mock mysql data.
To simplify schema creation, this PR removes the Schema.builder()
pattern and makes Schema immutable. All modifications now result in the
creation of a new Schema instance.
PhysicalOp relies on the input port number to determine if an operator
is a source operator. For Python UDF, from the changes in apache#3183, the
input ports are not correctly associated with the PhysicalOp, causing
all the Python UDFs to be recognized as source operators. This PR fixes
the issue.
)

The ubuntu-latest image has been updated to 24.04 from 22.04 in recent
days. However, the new image is incompatible with libncurses5, requiring
an upgrade to libncurses6. Unfortunately, after upgrading, sbt no longer
functions as expected, an issue also documented here:
[actions/setup-java#712](actions/setup-java#712).
It appears that the 24.04 image does not include sbt by default.

This PR addresses the issue by pinning the image to ubuntu-22.04. We can
revisit and update the version when the 24.04 image becomes more stable
and resolves these compatibility problems.
In this PR, we add the user avatar to the execution history panel.
<img width="462" alt="Screenshot 2025-01-06 at 3 53 23 PM"
src="https://github.com/user-attachments/assets/e4e662af-c1a9-4686-90f4-b6bef155a36b"
/>
…e#3147)

# Implement Apache Iceberg for Result Storage

<img width="556" alt="Screenshot 2025-01-06 at 3 18 19 PM"
src="https://github.com/user-attachments/assets/4edadb64-ee28-48ee-8d3c-1d1891d69d6a"
/>


## How to Enable Iceberg Result Storage
1. Update `storage-config.yaml`:
   - Set `result-storage-mode` to `iceberg`.

## Major Changes
- **Introduced `IcebergDocument`**: A thread-safe `VirtualDocument`
implementation for storing and reading results in Iceberg tables.
- **Introduced `IcebergTableWriter`**: Append-only writer for Iceberg
tables with configurable buffer size.
- **Catalog and Data storage for Iceberg**: Uses a local file system
(`file:/`) via `HadoopCatalog` and `HadoopFileIO`. This ensures Iceberg
operates without relying on external storage services.
- `ProgressiveSinkOpExec` with a new parameter `workerId` is added. Each
writer of the result storage will take this `workerId` as one new
parameter.

## Dependencies
- Added Apache Iceberg-related libraries.
- Introduced Hadoop-related libraries to support Iceberg's
`HadoopCatalog` and `HadoopFileIO`. These libraries are used for
placeholder configuration but do not enforce runtime dependency on HDFS.

## Overview of Iceberg Components
### `IcebergDocument`
- Manages reading and organizing data in Iceberg tables.
- Supports iterator-based incremental reads with thread-safe operations
for reading and clearing data.
- Initializes or overrides the Iceberg table during construction.

### `IcebergTableWriter`
- Writes data as immutable Parquet files in an append-only manner.
- Each writer uniquely prefixes its files to avoid conflicts
(`workerIndex_fileIndex` format).
- Not thread-safe—single-thread access is recommended.

## Data Storage via Iceberg Tables
- **Write**: 
  - Tables are created per `storage key`.
  - Writers append Parquet files to the table, ensuring immutability.
- **Read**:
  - Readers use `IcebergDocument.get` to fetch data via an iterator.
- The iterator reads data incrementally while ensuring data order
matches the commit sequence of the data files.

## Data Reading Using File Metadata
- Data files are read using `getUsingFileSequenceOrder`, which:
- Retrieves and sorts metadata files (`FileScanTask`) by sequence
numbers.
  - Reads records sequentially, skipping files or records as needed.
- Supports range-based reading (`from`, `until`) and incremental reads.
- Sorting ensures data consistency and order preservation.

## Hadoop Usage Without HDFS
- The `HadoopCatalog` uses an empty Hadoop configuration, defaulting to
the local file system (`file:/`).
- This enables efficient management of Iceberg tables in local or
network file systems without requiring HDFS infrastructure.

---------

Co-authored-by: Shengquan Ni <13672781+shengquan-ni@users.noreply.github.com>
This PR removes the `MemoryDocument` class and its usage. Additionally,
it updates the fallback mechanism for `MongoDocument`, changing it from
`Memory` to `Iceberg`.
…t functioning properly (apache#3200)

### Purpose:
Currently, a 400 error occurs when calling `persistWorkflow`. The reason
is that the parameter sent from the frontend to the backend is
`isPublished` instead of `isPublic`.

### Change:
Change the name of the parameter sent to the backend.

### Demos:
Before:

![image](https://github.com/user-attachments/assets/157ec547-5dfc-4cd2-b805-6ae919e7ed68)

After:

![image](https://github.com/user-attachments/assets/403ebab3-2f29-4993-af01-af7309ada02c)
Remove the redundant Flarum user registration service from the Texera
backend, as it is unnecessary. User registration for Flarum can be
handled directly by calling the Flarum user registration API from the
Texera frontend.

This PR does not affect the functionality or lifecycle of either Texera
or Flarum.
This PR addresses schema normalization and logic improvements for
tracking operator runtime statistics in a workflow execution system. It
introduces changes to the database schema, migration scripts, and Scala
code responsible for inserting and managing runtime statistics. The goal
is to reduce redundancy, improve maintainability, and ensure data
consistency between `operator_executions` and
`operator_runtime_statistics`.

### Schema Design
1. New Table Design:
- `operator_executions`: Tracks execution metadata for each operator in
a workflow execution. Each row contains `operator_execution_id`,
`workflow_execution_id`, and `operator_id`. This table ensures that
operator executions are uniquely identifiable.
- `operator_runtime_statistics`: Tracks runtime statistics for each
operator execution at specific timestamps. It includes
`operator_execution_id` as a foreign key, ensuring a direct reference to
`operator_executions`.

2. Normalization Improvements:
- Replaced repeated `execution_id` and `operator_id` in
`workflow_runtime_statistics` with a single foreign key
`operator_execution_id`, pointing to `operator_executions`.
- Split the previous large `workflow_runtime_statistics` table into
smaller, more manageable tables, eliminating redundancy and improving
data integrity.

3. Indexes and Keys:
- Added a composite index on `operator_execution_id` and `time` in
`operator_runtime_statistics` to speed up joins and queries ordered by
time.

### Testing
The `core/scripts/sql/update/19.sql` will create the two new tables,
`operator_executions` and `operator_runtime_statistics`, and migrate the
data from `workflow_runtime_statistics` to those two tables.

---------

Co-authored-by: Kunwoo Park <kunwoopark@dhcp-10-8-059-073.mobile.reshsg.uci.edu>
Co-authored-by: Kunwoo Park <kunwoopark@Kunwoos-MacBook-Pro.local>
Co-authored-by: Kunwoo Park <kunwoopark@dhcp-10-8-034-248.mobile.reshsg.uci.edu>
Co-authored-by: Kunwoo Park <kunwoopark@vcv070211.vpn.uci.edu>
Co-authored-by: Kunwoo Park <kunwoopark@dhcp-10-8-012-059.mobile.reshsg.uci.edu>
Co-authored-by: Kunwoo Park <kunwoopark@dhcp-172-31-252-248.mobile.uci.edu>
…3208)

### Purpose
It looks to restricted/inactive users as they can clone a workflow, when
they actually cannot. To reduce the confusion, the clone button is
disabled in the GUI.
fix apache#3066 

### Changes

Added variables isAdminOrRegularUser in
hub-workflow-detail-component.ts. Check the user role. If the user is
not an admin or regular user, disable the clone button.



https://github.com/user-attachments/assets/c426c50d-fb31-4377-a2b7-3de2a24da512

---------

Co-authored-by: Texera <texera@MacBook-Air.local>
aicam and others added 22 commits June 19, 2025 17:06
…nel (apache#3479)

## Change
Previously, we had operator descriptions but we did not use them in
front end. This PR include the operator description defined in
`OperatorInfo` in backend in operator editing panel.
![Screenshot from 2025-06-19
13-19-59](https://github.com/user-attachments/assets/eeffc491-9f46-445e-bd7c-4291b5cb9bb4)
![Screenshot from 2025-06-19
17-04-47](https://github.com/user-attachments/assets/c21b065b-7fd1-4e9d-8981-5f739591bac9)
![Screenshot from 2025-06-19
17-05-13](https://github.com/user-attachments/assets/b2a6f7f1-de0e-4ad1-aee0-2fbbe5fb5ff8)
This PR fix the issue that README.md displays the logo in the incorrect
ratio due to the update of the logo source file.

| Before       | After        |
|--------------|--------------|

|![image](https://github.com/user-attachments/assets/86970927-38bc-49c8-9550-48776ad76e2e)|![image](https://github.com/user-attachments/assets/22579a4a-ffa7-4aa0-8a21-e3405131b272)|
…ing Behavior (apache#3489)

### **Purpose:**
- Introduce three independent site‐wide settings, logo, mini logo, and
favicon, admin can customize the expanded sidebar logo, the collapsed
sidebar icon, and the browser tab icon.
- Provide a default fallback for each asset and support individual
resets.
- Eliminate the brief “flash” of default assets on page reload by
applying both logo and mini logo before the main dashboard renders.

### **Changes**
- In admin-settings.service.ts: moved all logo-path and favicon-path
retrieval into the service, with default fallbacks.
- In admin-settings.component.ts/html: Split out three file inputs and
previews for logo, mini logo, and favicon, update changes and reset
methods.
- In dashboard.component.ts/html: Simply subscribe to loadLogos()),
ensuring the sidebar only displays once the final URLs are available.

### **Demonstration**
Default logo, mini logo, and favicon.
<img width="1667" alt="logo1"
src="https://github.com/user-attachments/assets/57c0e47e-d58d-46f9-a298-a6d84db314fb"
/>
<img width="43" alt="mini1"
src="https://github.com/user-attachments/assets/a80adef0-6344-4fde-898c-5c7386a0c2c5"
/>
<img width="110" alt="favicon1"
src="https://github.com/user-attachments/assets/5c7964d0-dace-44c4-9054-cbaf42bf619a"
/>

Admin update in Settings
<img width="882" alt="change"
src="https://github.com/user-attachments/assets/f0422fef-cfbb-423d-923f-82ffe4c500cb"
/>

Dashboard after Save
<img width="1293" alt="logo2"
src="https://github.com/user-attachments/assets/1b3fee1b-8d23-4ce2-94eb-365a7028bc80"
/>
<img width="44" alt="mini2"
src="https://github.com/user-attachments/assets/47aaa3d3-13cb-46c7-802d-e22b471afc80"
/>
<img width="113" alt="favicon2"
src="https://github.com/user-attachments/assets/8e392a33-94b3-40b2-9f26-4186e3fae7e6"
/>

Hard Refresh

https://github.com/user-attachments/assets/f0959dca-7ae9-4f7d-bd0a-12e09786a9ac

---------

Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
### Implementation Issue

There is an issue with the current Output Port Materialization Writer
Thread's `closeOutputStorageWriterIfNeeded` method where we terminate
all the writer threads for all the ports when calling
`closeOutputStorageWriterIfNeeded` for only 1 port.

### Bug

The consequence of this incorrect logic is that an operator with
multiple output ports will not terminate all its output ports if there
are multiple output ports having materialization reader threads. Before
apache#3460 this issue was not exposed because we have no such use cases.
After apache#3460, A split operator connecting to a training operator will
have both its output ports materialized, causing Split operator to
misbehave.

### Fix
This PR fixes that issue by letting each output port only terminate its
own thread.
## Fix
When the property is being checked to see if it is valid or not, null is
mapped to "" using `c.value ?? ""` to make sure when an optional
property is deselected, it will cause no unintended error.
This PR introduces an alpha parameter (between 0 and 1) to the Scatter
Plot operator, allowing users to control the opacity of points in the
scatter plot visualization.

Example:

<img width="1284" alt="scatterPlot"
src="https://github.com/user-attachments/assets/6310e033-3f34-45f2-b02f-c1d4efd09f94"
/>
## New feature
Users may need to export result of one or multiple operators as Parquet
format. This includes exporting to local or dataset.

## How implemented
In `IcebergDocument` we implemented `asInputStream` as the underlying
function to zip all Parquet files stored in Iceberg.
By implementing this function in `IcebergDocument`, we are now able to
directly access the files stored by storage service. In our case, since
we need Parquet format, we can directly stream back the files saved by
Iceberg without need to convert and consume extra computation.

## Feature behavior
As shown in the picture, now users can select Parquet as export type. 

![Screenshot from 2025-06-18
10-50-54](https://github.com/user-attachments/assets/4db1f6b2-42f5-4ae9-86c9-3170ea03e793)
)

In this PR, we will rename ControlPayload to Direct Control Message
(DCM) to align with the terminology used in our discussions.
…f input schemas (apache#3501)

## Overview

This PR refines the workflow compilation during editing time by:
- `/api/compile` now returns the mapping of operator ID to its output
schemas
- Frontend only sends "valid" (json schema wise) operators and links to
the `WorkflowCompilingService`
- When calculating the input schema of some operators, i.e. invalid
operators but still need schema information, use the output schema
mapping to do the inference.

## User Experience Differences

### Before
When users just drop the operator, without configuring the required
properties, the compiler is already complaining about the error:
![2025-06-23 16 43
06](https://github.com/user-attachments/assets/5ceef18a-ede2-4e66-a196-cf94c7067adb)

### After
When the operator is not fully configured, it will NOT be sent to the
CompilingService to do the compilation. Meanwhile, users can still get
the schema auto-completion of this operator.
![2025-06-23 16 44
51](https://github.com/user-attachments/assets/828c1a9c-4fdd-4b0f-840b-ed0673d3d8cc)

Also, frontend can use the output schemas of the upstream operators to
detect if users connect two output ports with different schemas to the
same input port:
![2025-06-24 23 23
13](https://github.com/user-attachments/assets/3b63efc9-9c10-42a7-95ac-ffc491bbc685)
…. Case-sensitive keyword search feature to be re-applied in follow-up commit at the new common/workflow-operator paths.
…com/SarahAsad23/texera into sarah-keyword-search-case-sensitive

# Conflicts:
#	common/workflow-operator/src/main/scala/org/apache/texera/amber/operator/keywordSearch/CaseSensitiveAnalyzer.scala
@SarahAsad23 SarahAsad23 changed the title Sarah keyword search case sensitive feat(operator): Add case sensitivity to keyword search Operator Jun 10, 2026
@SarahAsad23 SarahAsad23 changed the title feat(operator): Add case sensitivity to keyword search Operator feat(operator): Add Case Sensitivity to Keyword Search Operator Jun 10, 2026
@codecov-commenter

codecov-commenter commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.34%. Comparing base (07ca5d4) to head (305670f).

Files with missing lines Patch % Lines
...operator/keywordSearch/CaseSensitiveAnalyzer.scala 0.00% 4 Missing ⚠️
...r/operator/keywordSearch/KeywordSearchOpExec.scala 0.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5600      +/-   ##
============================================
- Coverage     52.38%   52.34%   -0.05%     
+ Complexity     2484     2481       -3     
============================================
  Files          1070     1071       +1     
  Lines         41359    41365       +6     
  Branches       4441     4442       +1     
============================================
- Hits          21666    21651      -15     
- Misses        18427    18439      +12     
- Partials       1266     1275       +9     
Flag Coverage Δ *Carryforward flag
access-control-service 64.61% <ø> (ø)
agent-service 33.76% <ø> (ø) Carriedforward from 5b02684
amber 53.26% <14.28%> (-0.06%) ⬇️
computing-unit-managing-service 1.65% <ø> (ø)
config-service 56.06% <ø> (ø)
file-service 38.21% <ø> (ø)
frontend 46.86% <ø> (-0.06%) ⬇️ Carriedforward from 5b02684
pyamber 90.72% <ø> (ø) Carriedforward from 5b02684
python 90.75% <ø> (ø) Carriedforward from 5b02684
workflow-compiling-service 58.69% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Yicong-Huang

Copy link
Copy Markdown
Contributor

Hi @SarahAsad23 could you please add test cases so that we know this feature is working as expected? Thanks

@Yicong-Huang

Copy link
Copy Markdown
Contributor

@SarahAsad23 it also might be better to squash your commit history. This PR currently contains 3000+ commits that are unrelated to the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keyword search does not support case sensitive use case